Real time.doc

 

EG7017 – Real-time DSP

 

Lectures 1, 2 and 3 – The DSK50 and Lab1

·        Architecture of the C50

·        Assembly and addressing modes of the C50

·        AIC (A/D & D/A chip) of the DSK50

·        Interrupts of the DSK50 (RINT and XINT)

·        Integers, floating-point and Q15 notation

·        FIR digital filters – implementation

 

Lecture 4 – Introduction and definitions

·        What is a real-time system?

·        Soft and hard real-time systems

·        Basic computer architecture

·        Interrupts and interrupt priority

·        Embedded systems

 

Lectures 5 and 6 – issues on real-time systems

·        Real-time design issues

·        Specification and design of RT systems

·        Interrupt loading

·        Real-time multitasking – how can it be achieved?

 

Lectures 7 and 8 – the c6711 DSP processor and its DSK

·        TMS320C6711 DSP chip

·        DSK6711

·        Code Composer Studio

·        DSP BIOS

·        Your laboratory tasks

 

Real time.doc


 

EG717 – Real-time DSP

 

Some definitions

 

What is a real time system?

Definition 1:

A real-time system is one that must process information and produce a response within a specified time, else risk severe consequences, including failure. That is, in a system with a real-time constraint it is no good to have the correct action or the correct answer after a certain deadline: it is either by the deadline or it is useless!

 

Definition 2 (modified from The Oxford Dictionary of Computing):

Any system in which the time at which output is produced is significant. This is usually because the input corresponds to some event in the physical world, and the output has to relate to that same event. The lag from input time to output time must be sufficiently small for acceptable timeliness

 

Definition 3 (Young 1982):

Any information processing activity or system which has to respond to externally generated input stimuli within a finite and specified period.

 

Examples:

ABS, aircraft control, ticket reservation system at airport, over-temperature monitor in nuclear power station, mobile phone, oven temperature controller, Doppler blood-flow monitor, ECG/arrhythmia monitor.

 

Failure is inability to perform according to specification. In the case of real-time systems the ‘failed’ specification may be lack of correctness or the failure to produce the response by the required time.

 

A real-time system is one whose correctness is based on both the correctness of the outputs and their timeliness. The ‘novelty’ here is that the system is time critical.

 

NB: It does not have to be ‘fast’: the deadline may be days or weeks… For instance, Texas Instruments European Programme Information Centre (epic) responds within 48 hours for literature requests.

 

 

Hard real-time systems

In a lose approach all practical systems can be said to be real-time systems because they must produce an output or respond to the user’s commands within a reasonable amount of time (insurance company responding to letters, word processor displaying what was typed on the screen, mobile phones responding with delays that allow ‘comfortable’ conversation). These systems where ‘uncomfortably’ long response times are a nuisance (windows2000 springs to my mind) but the system still functions even if deadlines are sometimes not met are called soft real-time systems.  Systems where failure to meet response time constraints leads to catastrophic system failure (aircraft crashing, car skidding, patient dying before corrective action is performed) are called hard real-time systems. These are the ones we are interested in.

 

 

Basic computer architecture:

A system can be called a ‘computer’ if it has input, output, central processing unit (CPU) and memory. These are connected via a collection of transmission paths called buses (address, data and control), and energised by the power bus.

 

vonNeumann architecture, Harvard architecture, program memory, data memory and program flow (if …then go to, call, call if).

 

 

Interrupts

An interrupt is a hardware signal that triggers an action. You may say that an interrupt is a ‘function (or subroutine) call initiated by hardware’.

 

Return location, interrupt service routine location (interrupt handler location), interrupt priority, interrupt-based data acquisition

 

 

Embedded systems

 

Embedded systems are often real-time.

 

An embedded system is …

…”a software system that is completely encapsulated by the hardware that it controls”

Laplante 1997.

or

 

… “a system that contains al least one programmable computer (typically in the form of a microcontroller, a microprocessor or a digital signal processor chip) and which is used by individuals who are, in the main, unaware that it is computer based”.

Pont 2002.

 

 

Interrupt loading (or foreground/background systems loading)

 

All real-time solutions can be said to be flavours (or particular cases) of foreground/background systems. The collective of the interrupt driven tasks constitutes the foreground environment while the main processing happens in the background. An interrupt only system (such as ‘echoint’ or ‘filt1’ that you used in the C50 lab sessions) is a particular case where the background does nothing.

 

Obviously there is a loading associated with the interrupts (or foreground activities). The loading has to be such that it does not compromise the timeliness of the real-time system.

 

Let’s now calculate what this loading is.

 

Say we have a task that has a processing time Tproc and, for simplicity, let’s assume that the processing time is constant (context independent). A typical example situation could be that where the background task (main code) is to obtain the FFT of a N-length buffer while new samples are collected to an alternative buffer in the foreground (ISR).

 

When the background task runs uninterrupted its processing time flow can be represented by

 

                                    Tproc

 

 

 


Now let’s consider the effect of having periodic interrupts (say to read a sample from the A/D converter) at every period T=1/fsam, and let’s say the interrupt service routine takes a time Tisr to get serviced and return control to the main code. The consequence is that the main FFT processing gets ‘broken’ time wise, as indicated by

 

 


                 T=1/fsam

                                   

                           Tisr

                           

 


         Tt

                                                                                                                                         

 

 

The total processing time (foreground + background) becomes

 

Tt = Tproc + N ´ Tisr

 

where N is the number of interruptions that occur in Tt, given by

 

N = Tt / T = Tt ´ fsam

 

therefore

 

Tt = Tproc + (Tt ´  fsam) ´ Tisr

 

or

 

Tt (1 – fsam ´ Tisr) = Tproc

 

or

 

 

where Tt is the total computation time, including the ISR, Tproc is the time in which the background (main code) task would be completed if no interruptions had occurred, Tisr is the time necessary to service each interrupt and fsam is the frequency associated with the interrupts.

 

Obviously Tisr must always be shorter than the sampling period, T=1/ fsam, else the loading is > 100% and the background (main code) task will not run (in effect the processor would crash).

 

This loading might become important even with fast processors if the sampling rate (or interrupt rate, in the more generic case) is high – remember that one must perform context switching in the ISR. This is why DMA is preferred to interrupt driven sampling: the loading associated to DMA-based sampling is much less than that associated with interrupt-driven sampling.

 

 

Example 1

 

In the 1980’s I used a TMS32020 (the great-great grandfather of the TMS320C50) to perform real-time FFTs of blocks of 256 samples of Doppler signals while sampling new values using an interrupt-driven approach. Samples of the Doppler signal were collected at frequencies up to 40.96 kHz and the interrupt service routine I used took Tisr = 5.2 ms. The time needed to compute the FFT of one block of 256 samples was Tproc = 3.14646 ms. For this situation,

 

a)      obtain the total time needed to complete the FFT while sampling at the maximum sampling frequency, fsam = 40.96 kHz,

 

b)      what was the loading in the above situation?

 

c)      what was the maximum sampling rate the system could use?

 

Answers:

 

a)      Tt = Tproc / (1 – fsam ´ Tisr) = 3.998 ms » 4 ms

 

b)      The FFT must be ready in the same time (or less) than is needed to collect a new frame of 256 samples. At fsam  = 40.96 kHz this is 6.25 ms. We are taking 4 ms, therefore the loading is 4/6.25 = 64%.

 

c)      The maximum sampling rate is that for which the total processing time would be equal to 256*T (the time it would take to collect 256 samples). So, for fsam | max we have

 

 

This gives fsam =57.173 kHz. Obviously this implies a loading of 100% and this situation should be avoided if the processor is to do anything more (such as sending results to a host PC, as it was the case), and a good rule of thumb is that the loading should be not much more than about 70%.

 

 

Timing issues of direct transfer, poll and transfer, interrupt driven, DMA. Discussion.

 

 

More on loading and timing issues:

 

 

 

Some real-time design issues:

 

 

On safety issues:

Typing, run time promotion, casting, type checking (to prevent unwanted or undesirable type conversions from occurring)

Example:

     int i,m;

     float a,b;

    

     i=m*a+b;

the variable m will be promoted to a float type, and then the multiplication and addition will take place, then the result will be demoted to a int type and stored in i

 

 

If this is what the programmer wished he/she should have used type casting and written explicitely (for clarity and to help documenting the code) it like so

     int i,m;

     float a,b;

    

     i=(int)((float)m*a+b;

 

 

Also, since certain compilers would not convert constants to float at compile-time the programmer should avoid doing this:

     float a,b;

    

     b=a+60;

 

He /she should instead do this:

     float a,b;

    

     b=a+60.0;

to prevent an unwanted run-time promotion of the constant.

 

Also since load and convert to float take longer than the FLOAD (floating variable LOAD) the following code

     float a,b;

     int  j;

    

     b=a+j;

Should instead be written as

     float a,b;

     int  j;

    

     b=a+(float)j;

 

 

Optimisation and when not to use it:

Suppose we are using memory-mapped I/O to pulse a pin using the following extract of code:

char mem_map_output;

     mem_map_outport=0;

     mem_map_outport=1;

     mem_map_outport=0;

the code is an obvious candidate for optimisation and an optimiser compiler would replace it with the equivalent of

     mem_map_outport=0;

with potentially catastrophic consequences.

 

Solution: declare the variable volatile, changing the code to:

            volatile char mem_map_output;

     mem_map_outport=0;

     mem_map_outport=1;

     mem_map_outport=0;

           

Watchdog timer: What is it. ‘Patting the dog’.

 

Frequency versus criticality to determine the priority of tasks (or interrupts)

 

Commonly used programming languages for real-time systems:

 

 

 

 

Specification and design

 

Specification is written by the customer and documents what the software is to do and the environment on which it is to do it.

 

Design is written by the software analyst and documents how the software will do it.

 

 

 

Real-time multitasking – How can it be achieved?

 

In the simplest possible system it can be achieved in a polled loop with a round-robin scheme, but it is difficult to ensure fairness if no form of time-slice allocation is created.

 

 

tasks

 

 

 

 

 

 


         time

                                                                                                          

The switching from task to task happens either through completion or until the time slice expires. Without interrupts it is not possible to ensure fairness in the scheduling.

 

Normally multitasking is achieved with interrupts. A main program runs on the background and one (or more) ISR(s) run in the foreground. With interrupts one can have a preemptive priority system – a higher priority task is said to preempt a lower priority task in a scheme represented as follows

 

priority

 

 

 

 

 

 


         time

                                                                                                                                  

 

 

 

 


Rate monotonic systems are those in which priorities are assigned so that the higher the execution frequency the higher the priority. The rate monotonic scheme is the optimal arrangement of priorities for a fixed priority system.

 

 

Three safety issues on priorities or priority recursion problems in multitasking environments:

 

Issue 1: Frequency versus criticality. Discuss.

 

Issue 2: A lower priority routine may hold a resource (e.g. using a semaphore) that a higher priority routine needs. This is said to be the reason why one of the Mars pathfinders was lost: A low priority task held a serial communications port (the only such resource) and was interrupted by a higher priority task, but could not pre-empt because it was holding a resource…

 

Issue 3: on object oriented systems.

Object-oriented languages support:

·        Abstraction data types

·        Inheritance

·        Polymorphism

And due to attribute inheritance a class may inherit priorities that are in conflict with its intent. The solution suggested by Laplante (1997) is a careful assignment of attributes. Actually that author questions the applicability of o-o languages for implementation of RT systems.

 

 


Real-time systems based on interrupts:

 

Example 1 - echoint.asm

 

*****************************************************************

*       ECHOINT.ASM Program                                     *

*               Reads A/D and then echoes to D/A                *

*****************************************************************

;

; To run on the DSK modules with the TMS320C50 DSP chip

; F.S. Schlindwein, April 1995 - April 1998

;

; Declare memory mapped registers and program block address

        .mmregs              ;include memory mapped regs

;

; Define values to be programmed into AIC registers:

;---------------------------------------------------------------

;

; Fs=MCLK/(2*RA*RB)          ; sampling frequency

; Flp=MCLK/(80*RA)            ; low pass filter (antialiasing)

;

; 5 < RA < 32   ; RA is 5 bits (and the filter behaves funny if RA<6)

; 1 < RB < 64  ; RB is 6 bits

;

AIC_CTR  .word     8h       ; for use without BP filter

;AIC_CTR  .word     9h       ; for use with BP filter

TA       .word     6        ;       Auxin -----+  +----- Loopback

RA       .word     6        ;       Synch --+  |  |  +-- BP Filter

TAp      .word     1        ;               |  |  |  |

RAp      .word     1        ;+------------+------------+

TB       .word     18       ;|00 00 G1 G0 | SY AX LB BP|

RB       .word     18       ;+------------+------------+

AIC_CMD  .word     080h     ;       |  |

;                                   +--+---> GAIN = G1,G0

*

****************************************************************

*   Set up the ISR vectors                                     *

****************************************************************

       .ps     0080ah

       B       RINT         ; Set receive interrupt vector RINT, and

       B       XINT         ; Serial port transmit interrupt XINT.

*

        .ps     00a00h

        .entry              ; initial PC address

INIT

        setc    INTM        ; globally disable interrupts

        LDP     #0          ; initilise data page to ZERO

        setc    SXM         ; set sign extension mode

        setc    OVM          ; set overflow saturation mode

;---------------------------------------------------------------

        LDP     DXR         ; Load data page for DXR (zero)

        LAMM    IMR         ; load interrupt mask register

        OR      #30h        ; Turn on receive and transmit interrupts

        SAMM    IMR         ; store into interrupt mask register

        CLRC    INTM        ; globally enable interrupts

*

        call    AICINIT      ; Initialise TLC320C40 AIC chip

*

LOOP:  

        nop                  ; LOOP doing nothing.

        nop                  ; all runs in the ISR (RINT)

        B       LOOP

*

RINT:                             

        PUSH                ; push accumulator onto stack

        LAMM    DRR          ; Load Acc with Data Rx Register

                            ; (i.e. read A/D)

        AND     #0FFFCh     ; Clear d00=d01=0 on accumulator

        SAMM    DXR          ; Store Acc into Data Tx Register

                            ; (i.e. echo to D/A)

        POP                 ; restore Acc

XINT:   NOP                 ; do nothing

        NOP                 ; and then

        RETE                ; return from interrupt & re-enable interrupts

*

*********************************************************************

*     AICINIT                                                       *

*     DESCRIPTION: This routine initializes the TLC320C40 for a     *

*     sample rate defined by RA, RB, with a gain setting of 1       *

*********************************************************************

*

AICINIT: SPLK    #20h,TCR           ; Let's generate 10 MHz from Tout

         SPLK    #01h,PRD           ; for AIC master clock

      . . .               ; NB code removed

         IDLE

         RET

;

        .end                        ; end of the program

 

 

Example 2, in C, with more than one ISR:

 

void main (void)

 

{

     aicinit();         // initialise system and AIC

// (Analog Interface Chip)

     while(true);       // infinite loop …

//    idle()            // …or idle(), if processor supports it

}

 

void isr1 (void)

{

     push_all();        // save context on stack

     do_task1();        // service interrupt 1

     pop_all();         // restore context

}

 

void isr2 (void)

{

     push_all();        // save context on stack

     do_task2();        // service interrupt 2

     pop_all();         // restore context

}

 

void isr3 (void)

{

     push_all();        // save context on stack

     do_task3();        // service interrupt 3

     pop_all();         // restore context

}

 

 

Circular buffers in RT applications

 

An efficient and elegant way to implement signal collection and signal processing is to have 2 environments…

signal collection in the foreground (using interrupts or DMA) and

signal processing in the background (main program, in a loop testing for status)

…using the concept of circular buffers, i.e., both the signal collection and the signal processing fill (collection) and empty (processing) buffers in circular form.

 

Provided that the average time taken to process a buffer is less than the time to fill a buffer and that you implement enough buffers and test for overrun, things will be smooth. (as explained in my lecture).

 

The following is an extract of a real-time program written by me for a TMS32020 DSP (the grandfather of the C50. The sequence was 32010, 32020, 320C25, 320C50, 320C5402) that basically implements an FFT on the background while sampling at up to 40.96 KSPS in the foreground (ISR). Note the implementation of the circular buffer with the AND and the OR instructions (actually ANDK = AND with the immediate value and ORK = OR with immediate value – Texas instruments uses ‘K’ to indicate immediate… crazy Assembly language!).

 

* Nota Bene: Lots of code removed from the ‘main’…

* FSS removed that for clarity, May 2002.

 

      IDT   'FFTN89'

      REF   FFT256,FFTIN

*

*************************************

*

*     MAIN PROGRAM TO CALL FFT256.

*     LINK IT TO FFT256.MPO & FFTIN.MPO

*    

*     Real samples in pages 6 and 7 - See Interrupt Service Routine.

*     Calculation takes place on pages 4 and 5.

*     Results are copied to data RAM - pgs 8 and 9, then

*     the DSP board issues an INTR to the NIMBUS PC, waits 6.25 ms

*     and starts another FFT.

*

*           F.S.SCHLINDWEIW, 26-04-86, 23-11-89.

*

**************************************************************************

*

START EQU   >550

*

PAGE0       EQU   0     Page 0 of data memory for memory-mapped registers

PAGE10      EQU   10    PAGE 10 (500H) FOR RESULTS

TEMP        EQU   >63   Word 63h of B2 will be temporary store

RTPTRS      EQU   >74   REAL TIME POINTER OF SAMPLES

IMR         EQU   >4    Address of Interrupt Mask Register in Page Zero

*

* HARDWARE CONSTANTS

*

PORT0 EQU   0     Port 0 generates an INT2 to NIMBUS

*                 every time the 32020 writes to it

*                 since the NIMBUS had initialised the feature

TIM   EQU   1     Port 1 is the timer address

ADCNV EQU   2     A/D CONVERTER. Put link at LK6a.

*                 Port 2 is the ADC address when using

*                 interval timer sample clocking

DAC   EQU   2     Port 2 is the DAC address when using

*                 interval timer. Put link at LK6a.

*

ADC1  EQU   4     ADC's of blood pressure

ADC2  EQU   5

*

*

IMASK EQU   >FFC2 Interrupt mask to enable INT1 only

FIMBI EQU   >3FF  Last position of the input circular buffer

INIBI EQU   >300  First position of the input circular buffer

*

*

*************************

*                       *

*    MAIN PROGRAM       *

*     (extract only)    *

*                       *

*************************

*

      RORG  START

      B     FFT

*

*******************************************************************

* DON'T     PUT THESE 4 INSTRUCTIONS AT THE     BEGINNING IF USING LODI3 !!

      RORG  0

      B     START

*

      RORG  >4    INT1 VECTOR ( TIME TO CONVERT )

      B     ISR

************************************************************

      RORG  START+2

FFT

      LDPK  PAGE0       Page pointer set to 0

      LRLK  >0,TIMVAL   Timer value in AR0

      SAR   >0,TEMP     Store in data memory temporarily

      OUT   TEMP,TIM    Output value to timer port

*

      LRLK  >1,IMASK    Load interrupt mask into AR1

      SAR   >1,IMR      Put it into Mask Register

*

* INITIALIZATIONS:

*

      SOVM

      SSXM

      SPM   0

      CNFD

      LALK  INIBI+1

      SACL  RTPTRS            POINTER OF SAMPLES

*

*

* Nota Bene: Lots of other code here…

* FSS removed it for clarity, May 2002.

*

* Wait for a full frame (256 samples)

LOOP0F:

      LAC   RTPTRS      LET'S GET THE ISR POINTER

      ANDK  >0F         MASK IT TO SEE IF IT'S TIME...

      BNZ   LOOP0F      NOT YET.

LPFF: LAC   RTPTRS      LET'S GET THE ISR POINTER

      ANDK  >FF         MASK IT TO SEE IF IT'S TIME...

      BNZ   LPFF        NOT YET.

*

*

* Compute the FFT

*

AGAIN:

      CALL  FFT256

*

*

*****************************************************

*

      OUT   0,PORT0     FFT done: INTERRUPT THE NIMBUS PC

*

******************************************************

*

* TEST OF TIMING TO WAIT 6.25 ms BEFORE

* RESTARTING ANOTHER FFT.

*

LOOP5:

      LAC   RTPTRS      GET THE ISR POINTER

      ANDK  MASK5       MASK IT TO SEE IF IT'S TIME...

      BNZ   LOOP5       NOT YET.

      B     AGAIN       OK, IT'S TIME!

*

***************************************************************************

*

* Interrupt service routine

* Loads a circular buffer 300h to 3FFh

*

***************************************************************************

 

ISR LARP   4

    SST1   *-

    SST    *-

    SACH   *-

    SACL   *-

    SAR    AR3,*-

*

    LDPK   PAGE0

    LAR    AR3,RTPTRS

    SAR    AR3,*

    LAC    *

    ADLK   1

    ANDK   FIMBI

    ORK    INIBI

    SACL   RTPTRS

    SACL   *

    LAR    AR3,*+,AR3

    IN     *,ADCNV,AR4

*

    LAR    AR3,*+

    ZALS   *+

    ADDH   *+

    LST    *+

    LST1   *

    EINT

    RET

*

    END

 

 

 

Bibliography

 

 

Laplante, Philip A., “Real-time systems design and analysis – An Engineer’s Handboook, 2nd ed. IEEE Press, Piscataway, NJ, USA, http://www.ieee.org, 1997.

B. Widrow et al. "Adaptive noise cancelling: principles and applications", Proc IEEE, vol. 63, pp.1692-76, 1975.

B. Widrow, S.D. Sterns "Adaptive Signal Processing", Prentice -Hall, 1985.

Bateman, Andrew and Paterson-Stephens, Iain The DSP Handbook – Algorithms, applications and design techniques, Prentice-Hall, http://www.DSPStore.com, 2002.

Burns, A. and Wellings, A. “Real-time systems and their programming languages, Addison-Wesley, Wokingham, England, 1995.

Cioffi, J.M. and Kailath, T. "Fast Recursive-Least-Squares Filters for Adaptive Filtering", IEEE Transactions on Acoustics, Speech and Signal Processing, vol. ASSP-32, pp304-38, April 1984.

Ifeachor Jervis, "Digital Signal Processing, a practical approach", Addison Wesley, 1993FSS3. “MATLAB Wavelets toolbox manual”, Mathworks 1997.

Kay, Kun-Shan Lin, “Digital Signal Processing applications with the TMS320 family, vol. 1, Texas Instruments, Prentice-Hall, Inc, Englewood Cliffs, New Jersey 07632, 1987.

S.M. and Marple Jr. S.L. “Spectrum Analysis – a Modern Perspective”, Proc. IEEE, vol. 69, N.11, pp 1380-1419, 1981.

Oppenheim, A.V. and Schafer, R.W. “Discrete-Time Signal Processing”, Prentice Hall, 1989.

Young S.J. “Real time languages: Design and development. Chichester: Ellis Horwood, 1982.

 

Fernando S. Schlindwein, April/May 2002 to March 2004.