Using IAR Embedded Workbench for ARM and the CMSIS-DSP library

Содержание

Arm Cortex-M3/-M4 processors provides instructions for signal processing, for example SIMD (Single Instruction Multi Data). Especially Cortex-M4 is designed for DSP applications and it supports advanced SIMD, MAC (Multiply and Accumulate) Instructions. In addition, Cortex-M4F devices have FPU (floating point unit) for handling floating point calculations.

There are several ways to use these instructions, for example using assembler routines or intrinsic functions, but one of the most practical approaches is to use the Arm Cortex Microcontroller Software Interface Standard (CMSIS) DSP library. The CMSIS-DSP library is designed for Cortex-M processors and it provides optimized functions for digital signal processing such as matrix functions, statistic functions, advanced math functions etc.

A prebuild CMSIS-DSP library and its source code is provided in IAR Embedded Workbench for Arm and in this article, we will take a look at how to use CMSIS-DSP library with together with IAR Embedded Workbench for Arm and how this can improve the performance.

Configuring the CMSIS-DSP library

In IAR Embedded Workbench for Arm, you enable the use of the CMSIS-DSP library by first choosing a Cortex-M device, for example the Arm Cortex-M4F device STM32F407ZG.

11.png

Second, set the CMSIS-DSP library option in the General OptionsLibrary Configuration page. This will set the PATH for C preprocessor and import the pre-build CMSIS library.

22.png

These settings are all you need to be able to use CMSIS-DSP from IAR Embedded Workbench for Arm.

Simple test for CMSIS-DSP library

Let’s see how to call the CMSIS-DSP function and its performance. Here we will use the sqrt (square root) function and compare with the standard math function:

The results are identical and correct.

Next, let’s take a look at the performance.

The CYCLECOUNTER register in IAR Embedded Workbench are useful to check how many cycles that are consumed for the running code. The CCSTEP register is handy and useful when checking the number of cycles during the last performed C/C++ source or assembler step.

33.png

Set breakpoints and note the CCSTEP value for the sqrt functions:

digital_signal_processing-4.png

In this case, CMSIS-DSP sqrt function is more than 10 times faster than the standard math function.

arm_sqrt_f32 : 52 cycles
sqrt : 752 cycles

From this simple example, we can see that CMSIS-DSP is very easy to use and that it improves the performance significantly.

Practical example of FFT

Now, let’s take a look at one more practical example of CMSIS-DSP library. Fast Fourier Transform, FFT, is one of the most popular features of digital signal processing which can analysis frequency element from wave form data. IAR Embedded Workbench for Arm includes some CMSIS-DSP demo projects and in the following example, we use a STM32 example project by opening the STSTM32F4xxCMSIS and STM32CMSIS and STM32F4xx stdperiph lib 1.2.0RC2DSP Lib demo project.

digital_signal_processing.png

This workspace includes 11 demo projects.

digital_signal_processing-6.png

digital_signal_processing-7.png

This project includes arm_fft_bin_data.c which contains an array describing a 10 KHz signal disturbed with white noise.

digital_signal_processing-8.png

As the input data to the FFT algorithm should be complex numbers, odd numbers are the actual data and even numbers are the imaginary data and should be set to 0.

digital_signal_processing-9.png

Input signal disturbed with white noise.

digital_signal_processing-10.png

FFT result data are always symmetric and the output from the FFT demo contains a specific frequency component but also white noise.

Let’s go back to the main source code and notice we are using four CMSIS-DSP functions.

As the comments are indicating, the first one initializes FFT module, the second function is the actual FFT calculation, the third function calculates the magnitude of each bin of the FFT result from complex numbers and the fourth function find the max value and index from the output array.

The results are exactly same with the spread sheet chart shown before.

digital_signal_processing-11.png

Now, let’s see the performance of each function with CCSTEP.

arm_cfft_radix4_init_f32 54
arm_cfft_radix4_f32 100256
arm_cmplx_mag_f32 26913
arm_max_f32 8744

Total cycle is 135,967 cycles. If CPU runs at 100MHz, the total time will be 1,359 ms. When audio sampling rate is 44 kHz, 2048 sampling will take about 45,056 ms. Compare to the number, we can see the DSP performance is quite fast.

Let’s change the core to Cortex-M3 and see how the performance changes.

If CPU runs at 100MHz, the total time will be 22,539 ms. We can see how Cortex-M4 is optimized for DSP applications.

digital_signal_processing-12.png

Conclusion

Cortex-M processors provide high-performance instructions and especially Cortex-M4 supports instructions for DSP applications. To bring out the performance, think about using IAR Embedded Workbench for Arm together with the CMSIS-DSP library. If you cannot find the function in the library, you could also refer the source code under \arm\CMSIS\DSP_Lib\Source in IAR Embedded Workbench for Arm and create your own library.


Источник: www.iar.com