SlideShare a Scribd company logo
 
An Introduction to 3L Diamond on Sundance Hardware Some slides have extra information as notes.
What is 3L Diamond? Diamond is a set of tools and other components that work together with the TI C compiler and linker to support applications using multiprocessor hardware. Sundance hardware is well-suited Diamond’s way of dealing with multiprocessors and the combination provides the most rapid way to get your application running efficiently.
Why Diamond? The first response of many people when offered Diamond is: “ We do not need any extra software. Code Composer Studio provides everything we need to write multiprocessor applications.” Is this really true?
The Hardware The structure of Sundance hardware is a good place to start. Sundance provides modular hardware that allows you to build complex multiprocessor systems. Modules include an FPGA that is used to implement interprocessor  links  that allow pairs of processors to communicate. These include comports and SDBs.
A Sundance Module
Typical Hardware
Scaling Sundance hardware scales: There are no shared resources Adding processors adds communication No contention for shared memory or busses
How to Develop Applications Given hardware like this, the first thought will be that Code Composer from TI is ideal for developing applications. We shall now investigate this thought.
Code Composer Studio A good platform for single-processor work. No real support for multiprocessors. CCS is really a single-processor system You have to treat each processor separately. You build separate programs for each processor as follows:
Building with CCS
Problem: Specification You have to divide your application into separate programs for each processor. Modularity should be driven by the  program  structure. You should not use the  hardware  structure. Difficult to use several developers: only one program for each processor Difficult to test components hard to make each processor work in isolation
How do you load the application? You have to load using JTAG JTAG is very slow (0.2MB/s) You have all the parts of your application as separate  .out  files, one for each processor. You have to load these, one at a time. it is very easy to load the wrong processor it is very easy to forget to load a processor instructions for your users are complicated
Problem: Loading Customers need CCS (or load from ROM) Difficult to develop your own host program You can’t use JTAG from a program. You must use a separate mechanism to allow processors to communicate. This means you have to maintain two, unrelated networks: JTAG chain for loading I/O network for communication
Problem: Host integration Host communication is with JTAG. very slow very difficult to add your own host code Need to use other devices need to write host driver code how to start the host code & DSP code?
Problem: Communication How do the processors communicate? No support for Sundance peripherals Need to write device drivers Learn device details Manage EDMA Deal with EDMA coherency problems Manage interrupts Learn the tricks to make them run fast
Problem: Message routing If two processors want to exchange data but there is no direct connection between them, the data will have to be routed through intermediate nodes. How do you do this? How do you construct routing tables? by hand? build in knowledge of the processor network?
Problem: Deadlock A problem with all message routing systems is deadlocking. This is when sending data from one processor to another has to wait for data to be transmitted between another pair of processors, but that transmission needs to wait for the first to complete!
Deadlock prevention options Use a proven deadlock-free system. Make the user stop the program and change parameters each time a deadlock happens. Hope it never happens. The most common technique is: Be completely unaware deadlock can happen.
Problem: The Cache There are problems with cache coherency The cache cannot maintain coherence between: external memory EDMA transfers Transfers must handle cache coherency you cannot turn the cache off cache errors are very hard to find You have to sort out all these problems.
Why loading may fail JTAG loading assumes the cache is clear. This is not true with Sundance hardware.  After reset, a bootloader is loaded from ROM and executed. This initialises the processor and configures the FPGA to implement the inter-processor communication links. The code for the bootloader gets into the cache. JTAG loads behind the cache, leading to inconsistencies that prevent programs running.
Problem: Making changes How do you change the network? Rewrite sections of your code Are there enough EDMA channels? only 4 external interrupt lines for synchronisation what if you use more than 4 devices? host comport (2 devices) comport to another processor (2 devices) SDB to another processor (2 devices) that is already 6 devices
Problem: Changing Devices How do you change processors? different device addresses different memory sizes different memory addresses different initialisation requirements With CCS:  rewrite sections of your code.
Problem: Choosing devices Comports Sundance Digital Bus (SDB) Rocket I/O You need to learn how to use them. You need to write & maintain device drivers. You need to change your code to use them.
Before you start coding… Be certain you know how to partition the problem. Be certain you know how much memory you need. Be certain you know which modules you need. Be certain of the system topology. …  because it will be very hard to change.
The advantage of CCS You have complete control of everything… … because you have to do everything yourself …  and this takes a lot of time and experience.
CCS: Summary CCS works well with single processors It was not designed for multiple processors You have to do all the hard work Knowledge gets built into the application: processor types memory layout I/O devices being used connections between processors It is very hard to make significant changes.
Diamond Originally designed in 1987 tried and tested proven model Designed for multiprocessor systems Designed for simplicity Designed for efficiency during development during execution
Some advantages of Diamond Easy to use Gives you flexibility: late binding easy to change topology easy to change modules Reduces housekeeping memory usually allocated for you interrupts handled for you loading managed for you communication details managed for you processor issues handled for you
What Diamond is not Diamond is not a compiler we use the standard TI compiler and linker Diamond is not a simulator or an interpreter real, optimised code is generated Diamond is not DSP/BIOS it has it’s own optimised kernel, designed for multiprocessor operation it does not have or need a large API
Building with Diamond You partition the application into tasks: modularity determined by the needs of the application; you ignore processors here. Diamond adds an extra  configuration  step. The configurer: can see the whole application can optimise communication and device access. builds a  single  output file; nothing can get lost. arranges to load from this single file.
Building with CCS
Building with Diamond
With Diamond… The application is in a single file. Nothing can get lost. You cannot get loading wrong. Loading is easy load from the host no need for ROM during development development is fast
Diamond… is designed for multiprocessor systems. has its own small, efficient microkernel. has a small but effective API. is optimised for target hardware: it knows about different modules it automatically inserts optimised device drivers it handles interrupts it handles memory and the cache is very good at communication leaves you free to concentrate on your code.
Sundance TIMs
Dual-Processor Module Identical to two separate modules; there are no shared resources.
The Diamond Model Diamond builds applications from independent  tasks  that send data to other tasks using  channels . This model is based upon CSP: Communicating Sequential Processes.
CSP Communicating Sequential Processes Forget about processors
A Diamond application is… Tasks complete C programs start at a  main  function fully linked (but relocatable) input & output  ports  for connecting channels unlimited number of ports Multi-threaded Channels data transfer mechanisms transfer data from one task to one other blocking: both ends wait for completion
Channels Many possible implementations memcpy – between tasks on one processor I/O  - between adjacent processors comports SDBs Rapid IO links Routed I/O – between remote processors software routing guaranteed deadlock-free any task can communicate with any other task Diamond will choose the best implementation.
The Hardware
A Sundance Network
Ideal Hardware No shared resources Simplifies hardware Simplifies software Scales: more processors = more power Connected by communication links Add processors = add bandwidth Designing multiprocessor hardware: Speak to 3L first.
Tasks & Channels
Map onto hardware
A simple task
A simple task #include <chan.h> INPUT_PORT(0, DATA_IN) OUTPUT_PORT(0, DATA_OUT) main() { int n; for (;;) { chan_in_word (&n,  &DATA_IN); chan_out_word(n+1, &DATA_OUT); } }
Team Working Tasks are self-contained They are developed separately Communication between tasks: is a contract allows test systems to be built Ideal for team working
Design Flow Network Tasks Channels
Design Flow Network Code tasks Sources
Design Flow Network Code tasks Compile & Link Tasks
Design Flow Network Code tasks Compile & Link Configuration File configuration file
Design Flow Network Code tasks Compile & Link Configuration File Configure application file
Design Flow Network Code tasks Compile & Link Configuration File Configure Load & Run application file processor network
Running an application
Demonstration Hardware SMT365 SMT370 SMT374 SMT361 Only the SMT365 and the SMT361 will be used in the examples.
A Correlator Example
Code Each Task OUTPUT_PORT(2, COR_DATA)  INPUT_PORT (1, COR_RESULT) . . .  main() { printf(&quot;3L Diamond Example\n&quot;); for (;;) { . . .  chan_out_message(BYTES, Data, &COR_DATA); chan_in_message(BYTES, Result, &COR_RESULT);  . . .  } }
Configuration Write a configuration file to: Describe the hardware processors connections between processors Describe the software tasks channels connecting tasks Map the software onto the hardware place tasks on processors
Task names TASK  example2  TASK  mainctrl  TASK  disp_raw  TASK  disp_cor  TASK  UI  TASK  correlator
Task ports TASK  example2  INS=3  OUTS=7 TASK  mainctrl  INS=1  OUTS=1  TASK  disp_raw  INS=2  OUTS=0  TASK  disp_cor  INS=2  OUTS=0  TASK  UI  INS=1  OUTS=1  TASK  correlator  INS=1  OUTS=1
Task stack & heap TASK  example2  INS=3  OUTS=7  DATA=500K TASK  mainctrl  INS=1  OUTS=1  DATA=200K TASK  disp_raw  INS=2  OUTS=0  DATA=200K TASK  disp_cor  INS=2  OUTS=0  DATA=200K TASK  UI  INS=1  OUTS=1  DATA=200K TASK  correlator  INS=1  OUTS=1  DATA=32K
Task starting priorities TASK  example2  urgent  INS=3  OUTS=7  DATA=500K TASK  mainctrl  INS=1  OUTS=1  DATA=200K TASK  disp_raw  INS=2  OUTS=0  DATA=200K TASK  disp_cor  INS=2  OUTS=0  DATA=200K TASK  UI  urgent  INS=1  OUTS=1  DATA=200K TASK  correlator  priority=2  INS=1  OUTS=1  DATA=32K ! The starting priority is 1 unless explicitly stated.
Channel creation !  channel  output port  input port !  =======  ===========  ========== CONNECT C1  UI[0]  example2[0] CONNECT C2  example2[5]  mainctrl[0] CONNECT C3  mainctrl[0]  example2[2] CONNECT C4  example2[0]  disp_raw[0] CONNECT C5  example2[1]  disp_raw[1] CONNECT C6  example2[2]  correlator[0] CONNECT C7  correlator[0]  example2[1] CONNECT C8  example2[3]  disp_cor[0] CONNECT C9  example2[4]  disp_cor[1] CONNECT C10  example2[6]  UI[0]
The processor & placement PROCESSOR Root  SMT365_8_1 … PLACE  mainctrl  Root PLACE  example2  Root PLACE  disp_raw  Root PLACE  disp_cor  Root PLACE  UI  Root PLACE  correlator Root
Processor types Diamond supports all of the Sundance TIMs. The  ProcType  utility will display them all.
A note about memory With CCS you need to: specify memory explicitly. know which “sections” are used by the compiler allocate memory explicitly at the start Diamond can do all memory allocation available memory determined automatically no linker command files but, you can tell Diamond how to use memory this is an optimisation once the code is working. ignore it until the program’s needs are understood.
Building & Running Compile each task with the command:  3L C Link each task with the command:  3L T Configure with the command:  3L A Execute with the command:  3L X
Making it run faster
Use a second processor We shall use TIM1 (SMT365) and TIM4 (SMT361) connected by comports 0 & 3 respectively.
Demonstration Hardware SMT365 SMT370 SMT374 SMT361
Use a second processor PROCESSOR Root  SMT365_8_1 … PLACE  mainctrl  Root PLACE  example2  Root PLACE  disp_raw  Root PLACE  disp_cor  Root PLACE  UI  Root PLACE  correlator Root
Use a second processor PROCESSOR Root  SMT365_8_1 PROCESSOR Node   SMT361 … PLACE  mainctrl  Root PLACE  example2  Root PLACE  disp_raw  Root PLACE  disp_cor  Root PLACE  UI  Root PLACE  correlator Root
Use a second processor PROCESSOR Root  SMT365_8_1 PROCESSOR Node  SMT361 WIRE W1   Root[CP:0]  Node[CP:3] … PLACE  mainctrl  Root PLACE  example2  Root PLACE  disp_raw  Root PLACE  disp_cor  Root PLACE  UI  Root PLACE  correlator Root
Use a second processor PROCESSOR Root  SMT365_8_1 PROCESSOR Node  SMT361 WIRE W1  Root[CP:0]  Node[CP:3] … PLACE  mainctrl  Root PLACE  example2  Root PLACE  disp_raw  Root PLACE  disp_cor  Root PLACE  UI  Root PLACE  correlator   Node
Notes The two tasks have not changed in any way. Their connections have not changed. No need to recompile them or relink them. All we changed to move the tasks onto a second processor was the configuration file. We just built a new application by running the configuration command again (3L A). Loading the two processors is automatic .
Making it go even faster
Use the FPGA on the SMT365 PROCESSOR Root  SMT365_8_1 PROCESSOR F  FPGA … PLACE  mainctrl  Root PLACE  example2  Root PLACE  disp_raw  Root PLACE  disp_cor  Root PLACE  UI  Root PLACE  correlator Root
The FPGA is already being used The FPGA is also used to support functions on the SMT365 DSP. Attaching the FPGA to its processor allows the configurer to include all the necessary logic to support the needed functions.
Use the FPGA PROCESSOR Root  SMT365_8_1 PROCESSOR F  FPGA  ATTACH=Root … PLACE  mainctrl  Root PLACE  example2  Root PLACE  disp_raw  Root PLACE  disp_cor  Root PLACE  UI  Root PLACE  correlator Root
Use the FPGA PROCESSOR Root  SMT365_8_1 PROCESSOR F  FPGA  ATTACH=Root WIRE W1  Root[SDB:0]  F[SDB_DEVICE:0] … PLACE  mainctrl  Root PLACE  example2  Root PLACE  disp_raw  Root PLACE  disp_cor  Root PLACE  UI  Root PLACE  correlator Root
Use the FPGA PROCESSOR Root  SMT365_8_1 PROCESSOR F  FPGA  ATTACH=Root WIRE W1  Root[SDB:0]  F[SDB_DEVICE:0] … PLACE  mainctrl  Root PLACE  example2  Root PLACE  disp_raw  Root PLACE  disp_cor  Root PLACE  UI  Root PLACE  correlator   F
FPGA Tasks Placing a task on an FPGA instructs the configurer to look for an FPGA version of the task. This can be written using: VHDL Xilinx System Generator Handel-C (Celoxica) Any other method you like.
Building with FPGA The configurer will construct a Xilinx project for the FPGA It will call the Xilinx toold to build a complete bitstream. The bitstream will be included in the single application file. The FPGA will be configured automatically as the application is loaded.
Conclusion Diamond does a lot of the work for you. Diamond allows you to change your mind and alter processors and topology. Diamond gives a structured model for developing efficient applications. The Diamond model is the same for any number and any combination of processors: DSP or FPGA. Diamond simplifies developing multiprocessor applications.
 

More Related Content

DOC
Micro Assembler
PPTX
Introduction to C Programming
PDF
Embedded systems tools & peripherals
PPTX
Build process in ST Visual Develop
PPT
7496_Hall 070204 Research Faculty Summit
PPTX
Computer
PDF
Wireless lan scope and sequences
PPTX
Microassembler a10
Micro Assembler
Introduction to C Programming
Embedded systems tools & peripherals
Build process in ST Visual Develop
7496_Hall 070204 Research Faculty Summit
Computer
Wireless lan scope and sequences
Microassembler a10

What's hot (20)

PDF
What every-programmer-should-know-about-memory
PPTX
Optimizing Lua For Consoles - Allen Murphy (Microsoft)
PPTX
Computer Programming Grade 9
PPTX
Chorus - Distributed Operating System [ case study ]
PDF
Distributed operating system amoeba case study
PPT
Fg b
PDF
Module 2 3
PDF
Vskills c developer sample material
PPT
Software and os ch5
PDF
Vskills c++ developer sample material
PPT
UML Case Tools
PDF
Physical computing and iot programming final with cp sycs sem 3
PPTX
Amoeba distributed operating System
DOC
Original assignment
PPT
J2 me 1
DOCX
Network Testing ques
DOCX
Mobile application development
PPTX
Mca i-fundamental of computer-u-2- application and system software
PPT
Net essentials6e ch9
PPT
What every-programmer-should-know-about-memory
Optimizing Lua For Consoles - Allen Murphy (Microsoft)
Computer Programming Grade 9
Chorus - Distributed Operating System [ case study ]
Distributed operating system amoeba case study
Fg b
Module 2 3
Vskills c developer sample material
Software and os ch5
Vskills c++ developer sample material
UML Case Tools
Physical computing and iot programming final with cp sycs sem 3
Amoeba distributed operating System
Original assignment
J2 me 1
Network Testing ques
Mobile application development
Mca i-fundamental of computer-u-2- application and system software
Net essentials6e ch9
Ad

Viewers also liked (8)

PDF
ISProjects Case Telefoongids
PPT
Desarrollo Profesional - Noviembre 2008
PPT
Eprivacy: Regulatory trends in Europe
PDF
Unit 1 specification
PPT
Broadband developments and comparisons
PDF
HTML5 Design
PPTX
#heweb10 Facebook Faceplant: Lessons Learned from Social Media Failures (and ...
PPT
Buying roles & family influences
ISProjects Case Telefoongids
Desarrollo Profesional - Noviembre 2008
Eprivacy: Regulatory trends in Europe
Unit 1 specification
Broadband developments and comparisons
HTML5 Design
#heweb10 Facebook Faceplant: Lessons Learned from Social Media Failures (and ...
Buying roles & family influences
Ad

Similar to Overview (20)

PPTX
Fixed-point Multi-Core DSP Platform
PDF
Lecture24 Multiprocessor
PDF
Embedded Systems Architecture Programming and Design 2nd Edition Raj Kamal
PPTX
Embedded Systems design by using micro controller and micro processor
PPTX
Embedded Systems Introdution
PPTX
UNIT 1 _ Embedded system -design steps PPT.pptx
PDF
EFFECTIVE EMBEDDED SYSTEMS SOFTWARE DESIGN METHODOLOGIES
PPT
Embedded system
PPT
Embeddedsystem
PPT
BWU_BTE_21_030_OE_EE702A.ppthhgggggggggg
PDF
2e062d07-4a72-4792-af77-5e53147d4c81.pdf
PDF
Embedded Os [Linux & Co.]
PPT
Embedded firmware
PPT
Module-3 embedded system firmware code.ppt
PPT
Module-3 Embedded syatem firmware design.ppt
PPT
Embedded systems in brief
PDF
Diamond
PPTX
Embedded os
PPT
Architecture offffffffffffff ESD-ppt.ppt
PDF
BYOD Revisited: Build Your Own Device (Embedded Linux Conference 2014)
Fixed-point Multi-Core DSP Platform
Lecture24 Multiprocessor
Embedded Systems Architecture Programming and Design 2nd Edition Raj Kamal
Embedded Systems design by using micro controller and micro processor
Embedded Systems Introdution
UNIT 1 _ Embedded system -design steps PPT.pptx
EFFECTIVE EMBEDDED SYSTEMS SOFTWARE DESIGN METHODOLOGIES
Embedded system
Embeddedsystem
BWU_BTE_21_030_OE_EE702A.ppthhgggggggggg
2e062d07-4a72-4792-af77-5e53147d4c81.pdf
Embedded Os [Linux & Co.]
Embedded firmware
Module-3 embedded system firmware code.ppt
Module-3 Embedded syatem firmware design.ppt
Embedded systems in brief
Diamond
Embedded os
Architecture offffffffffffff ESD-ppt.ppt
BYOD Revisited: Build Your Own Device (Embedded Linux Conference 2014)

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
A Presentation on Touch Screen Technology
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Getting Started with Data Integration: FME Form 101
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
project resource management chapter-09.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Mushroom cultivation and it's methods.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
A comparative analysis of optical character recognition models for extracting...
Approach and Philosophy of On baking technology
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
WOOl fibre morphology and structure.pdf for textiles
A Presentation on Touch Screen Technology
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
SOPHOS-XG Firewall Administrator PPT.pptx
cloud_computing_Infrastucture_as_cloud_p
Group 1 Presentation -Planning and Decision Making .pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Getting Started with Data Integration: FME Form 101
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
project resource management chapter-09.pdf
1 - Historical Antecedents, Social Consideration.pdf
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Heart disease approach using modified random forest and particle swarm optimi...
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Unlocking AI with Model Context Protocol (MCP)
Mushroom cultivation and it's methods.pdf
A Presentation on Artificial Intelligence
A comparative analysis of optical character recognition models for extracting...

Overview

  • 1.  
  • 2. An Introduction to 3L Diamond on Sundance Hardware Some slides have extra information as notes.
  • 3. What is 3L Diamond? Diamond is a set of tools and other components that work together with the TI C compiler and linker to support applications using multiprocessor hardware. Sundance hardware is well-suited Diamond’s way of dealing with multiprocessors and the combination provides the most rapid way to get your application running efficiently.
  • 4. Why Diamond? The first response of many people when offered Diamond is: “ We do not need any extra software. Code Composer Studio provides everything we need to write multiprocessor applications.” Is this really true?
  • 5. The Hardware The structure of Sundance hardware is a good place to start. Sundance provides modular hardware that allows you to build complex multiprocessor systems. Modules include an FPGA that is used to implement interprocessor links that allow pairs of processors to communicate. These include comports and SDBs.
  • 8. Scaling Sundance hardware scales: There are no shared resources Adding processors adds communication No contention for shared memory or busses
  • 9. How to Develop Applications Given hardware like this, the first thought will be that Code Composer from TI is ideal for developing applications. We shall now investigate this thought.
  • 10. Code Composer Studio A good platform for single-processor work. No real support for multiprocessors. CCS is really a single-processor system You have to treat each processor separately. You build separate programs for each processor as follows:
  • 12. Problem: Specification You have to divide your application into separate programs for each processor. Modularity should be driven by the program structure. You should not use the hardware structure. Difficult to use several developers: only one program for each processor Difficult to test components hard to make each processor work in isolation
  • 13. How do you load the application? You have to load using JTAG JTAG is very slow (0.2MB/s) You have all the parts of your application as separate .out files, one for each processor. You have to load these, one at a time. it is very easy to load the wrong processor it is very easy to forget to load a processor instructions for your users are complicated
  • 14. Problem: Loading Customers need CCS (or load from ROM) Difficult to develop your own host program You can’t use JTAG from a program. You must use a separate mechanism to allow processors to communicate. This means you have to maintain two, unrelated networks: JTAG chain for loading I/O network for communication
  • 15. Problem: Host integration Host communication is with JTAG. very slow very difficult to add your own host code Need to use other devices need to write host driver code how to start the host code & DSP code?
  • 16. Problem: Communication How do the processors communicate? No support for Sundance peripherals Need to write device drivers Learn device details Manage EDMA Deal with EDMA coherency problems Manage interrupts Learn the tricks to make them run fast
  • 17. Problem: Message routing If two processors want to exchange data but there is no direct connection between them, the data will have to be routed through intermediate nodes. How do you do this? How do you construct routing tables? by hand? build in knowledge of the processor network?
  • 18. Problem: Deadlock A problem with all message routing systems is deadlocking. This is when sending data from one processor to another has to wait for data to be transmitted between another pair of processors, but that transmission needs to wait for the first to complete!
  • 19. Deadlock prevention options Use a proven deadlock-free system. Make the user stop the program and change parameters each time a deadlock happens. Hope it never happens. The most common technique is: Be completely unaware deadlock can happen.
  • 20. Problem: The Cache There are problems with cache coherency The cache cannot maintain coherence between: external memory EDMA transfers Transfers must handle cache coherency you cannot turn the cache off cache errors are very hard to find You have to sort out all these problems.
  • 21. Why loading may fail JTAG loading assumes the cache is clear. This is not true with Sundance hardware. After reset, a bootloader is loaded from ROM and executed. This initialises the processor and configures the FPGA to implement the inter-processor communication links. The code for the bootloader gets into the cache. JTAG loads behind the cache, leading to inconsistencies that prevent programs running.
  • 22. Problem: Making changes How do you change the network? Rewrite sections of your code Are there enough EDMA channels? only 4 external interrupt lines for synchronisation what if you use more than 4 devices? host comport (2 devices) comport to another processor (2 devices) SDB to another processor (2 devices) that is already 6 devices
  • 23. Problem: Changing Devices How do you change processors? different device addresses different memory sizes different memory addresses different initialisation requirements With CCS: rewrite sections of your code.
  • 24. Problem: Choosing devices Comports Sundance Digital Bus (SDB) Rocket I/O You need to learn how to use them. You need to write & maintain device drivers. You need to change your code to use them.
  • 25. Before you start coding… Be certain you know how to partition the problem. Be certain you know how much memory you need. Be certain you know which modules you need. Be certain of the system topology. … because it will be very hard to change.
  • 26. The advantage of CCS You have complete control of everything… … because you have to do everything yourself … and this takes a lot of time and experience.
  • 27. CCS: Summary CCS works well with single processors It was not designed for multiple processors You have to do all the hard work Knowledge gets built into the application: processor types memory layout I/O devices being used connections between processors It is very hard to make significant changes.
  • 28. Diamond Originally designed in 1987 tried and tested proven model Designed for multiprocessor systems Designed for simplicity Designed for efficiency during development during execution
  • 29. Some advantages of Diamond Easy to use Gives you flexibility: late binding easy to change topology easy to change modules Reduces housekeeping memory usually allocated for you interrupts handled for you loading managed for you communication details managed for you processor issues handled for you
  • 30. What Diamond is not Diamond is not a compiler we use the standard TI compiler and linker Diamond is not a simulator or an interpreter real, optimised code is generated Diamond is not DSP/BIOS it has it’s own optimised kernel, designed for multiprocessor operation it does not have or need a large API
  • 31. Building with Diamond You partition the application into tasks: modularity determined by the needs of the application; you ignore processors here. Diamond adds an extra configuration step. The configurer: can see the whole application can optimise communication and device access. builds a single output file; nothing can get lost. arranges to load from this single file.
  • 34. With Diamond… The application is in a single file. Nothing can get lost. You cannot get loading wrong. Loading is easy load from the host no need for ROM during development development is fast
  • 35. Diamond… is designed for multiprocessor systems. has its own small, efficient microkernel. has a small but effective API. is optimised for target hardware: it knows about different modules it automatically inserts optimised device drivers it handles interrupts it handles memory and the cache is very good at communication leaves you free to concentrate on your code.
  • 37. Dual-Processor Module Identical to two separate modules; there are no shared resources.
  • 38. The Diamond Model Diamond builds applications from independent tasks that send data to other tasks using channels . This model is based upon CSP: Communicating Sequential Processes.
  • 39. CSP Communicating Sequential Processes Forget about processors
  • 40. A Diamond application is… Tasks complete C programs start at a main function fully linked (but relocatable) input & output ports for connecting channels unlimited number of ports Multi-threaded Channels data transfer mechanisms transfer data from one task to one other blocking: both ends wait for completion
  • 41. Channels Many possible implementations memcpy – between tasks on one processor I/O - between adjacent processors comports SDBs Rapid IO links Routed I/O – between remote processors software routing guaranteed deadlock-free any task can communicate with any other task Diamond will choose the best implementation.
  • 44. Ideal Hardware No shared resources Simplifies hardware Simplifies software Scales: more processors = more power Connected by communication links Add processors = add bandwidth Designing multiprocessor hardware: Speak to 3L first.
  • 48. A simple task #include <chan.h> INPUT_PORT(0, DATA_IN) OUTPUT_PORT(0, DATA_OUT) main() { int n; for (;;) { chan_in_word (&n, &DATA_IN); chan_out_word(n+1, &DATA_OUT); } }
  • 49. Team Working Tasks are self-contained They are developed separately Communication between tasks: is a contract allows test systems to be built Ideal for team working
  • 50. Design Flow Network Tasks Channels
  • 51. Design Flow Network Code tasks Sources
  • 52. Design Flow Network Code tasks Compile & Link Tasks
  • 53. Design Flow Network Code tasks Compile & Link Configuration File configuration file
  • 54. Design Flow Network Code tasks Compile & Link Configuration File Configure application file
  • 55. Design Flow Network Code tasks Compile & Link Configuration File Configure Load & Run application file processor network
  • 57. Demonstration Hardware SMT365 SMT370 SMT374 SMT361 Only the SMT365 and the SMT361 will be used in the examples.
  • 59. Code Each Task OUTPUT_PORT(2, COR_DATA) INPUT_PORT (1, COR_RESULT) . . . main() { printf(&quot;3L Diamond Example\n&quot;); for (;;) { . . . chan_out_message(BYTES, Data, &COR_DATA); chan_in_message(BYTES, Result, &COR_RESULT); . . . } }
  • 60. Configuration Write a configuration file to: Describe the hardware processors connections between processors Describe the software tasks channels connecting tasks Map the software onto the hardware place tasks on processors
  • 61. Task names TASK example2 TASK mainctrl TASK disp_raw TASK disp_cor TASK UI TASK correlator
  • 62. Task ports TASK example2 INS=3 OUTS=7 TASK mainctrl INS=1 OUTS=1 TASK disp_raw INS=2 OUTS=0 TASK disp_cor INS=2 OUTS=0 TASK UI INS=1 OUTS=1 TASK correlator INS=1 OUTS=1
  • 63. Task stack & heap TASK example2 INS=3 OUTS=7 DATA=500K TASK mainctrl INS=1 OUTS=1 DATA=200K TASK disp_raw INS=2 OUTS=0 DATA=200K TASK disp_cor INS=2 OUTS=0 DATA=200K TASK UI INS=1 OUTS=1 DATA=200K TASK correlator INS=1 OUTS=1 DATA=32K
  • 64. Task starting priorities TASK example2 urgent INS=3 OUTS=7 DATA=500K TASK mainctrl INS=1 OUTS=1 DATA=200K TASK disp_raw INS=2 OUTS=0 DATA=200K TASK disp_cor INS=2 OUTS=0 DATA=200K TASK UI urgent INS=1 OUTS=1 DATA=200K TASK correlator priority=2 INS=1 OUTS=1 DATA=32K ! The starting priority is 1 unless explicitly stated.
  • 65. Channel creation ! channel output port input port ! ======= =========== ========== CONNECT C1 UI[0] example2[0] CONNECT C2 example2[5] mainctrl[0] CONNECT C3 mainctrl[0] example2[2] CONNECT C4 example2[0] disp_raw[0] CONNECT C5 example2[1] disp_raw[1] CONNECT C6 example2[2] correlator[0] CONNECT C7 correlator[0] example2[1] CONNECT C8 example2[3] disp_cor[0] CONNECT C9 example2[4] disp_cor[1] CONNECT C10 example2[6] UI[0]
  • 66. The processor & placement PROCESSOR Root SMT365_8_1 … PLACE mainctrl Root PLACE example2 Root PLACE disp_raw Root PLACE disp_cor Root PLACE UI Root PLACE correlator Root
  • 67. Processor types Diamond supports all of the Sundance TIMs. The ProcType utility will display them all.
  • 68. A note about memory With CCS you need to: specify memory explicitly. know which “sections” are used by the compiler allocate memory explicitly at the start Diamond can do all memory allocation available memory determined automatically no linker command files but, you can tell Diamond how to use memory this is an optimisation once the code is working. ignore it until the program’s needs are understood.
  • 69. Building & Running Compile each task with the command: 3L C Link each task with the command: 3L T Configure with the command: 3L A Execute with the command: 3L X
  • 70. Making it run faster
  • 71. Use a second processor We shall use TIM1 (SMT365) and TIM4 (SMT361) connected by comports 0 & 3 respectively.
  • 72. Demonstration Hardware SMT365 SMT370 SMT374 SMT361
  • 73. Use a second processor PROCESSOR Root SMT365_8_1 … PLACE mainctrl Root PLACE example2 Root PLACE disp_raw Root PLACE disp_cor Root PLACE UI Root PLACE correlator Root
  • 74. Use a second processor PROCESSOR Root SMT365_8_1 PROCESSOR Node SMT361 … PLACE mainctrl Root PLACE example2 Root PLACE disp_raw Root PLACE disp_cor Root PLACE UI Root PLACE correlator Root
  • 75. Use a second processor PROCESSOR Root SMT365_8_1 PROCESSOR Node SMT361 WIRE W1 Root[CP:0] Node[CP:3] … PLACE mainctrl Root PLACE example2 Root PLACE disp_raw Root PLACE disp_cor Root PLACE UI Root PLACE correlator Root
  • 76. Use a second processor PROCESSOR Root SMT365_8_1 PROCESSOR Node SMT361 WIRE W1 Root[CP:0] Node[CP:3] … PLACE mainctrl Root PLACE example2 Root PLACE disp_raw Root PLACE disp_cor Root PLACE UI Root PLACE correlator Node
  • 77. Notes The two tasks have not changed in any way. Their connections have not changed. No need to recompile them or relink them. All we changed to move the tasks onto a second processor was the configuration file. We just built a new application by running the configuration command again (3L A). Loading the two processors is automatic .
  • 78. Making it go even faster
  • 79. Use the FPGA on the SMT365 PROCESSOR Root SMT365_8_1 PROCESSOR F FPGA … PLACE mainctrl Root PLACE example2 Root PLACE disp_raw Root PLACE disp_cor Root PLACE UI Root PLACE correlator Root
  • 80. The FPGA is already being used The FPGA is also used to support functions on the SMT365 DSP. Attaching the FPGA to its processor allows the configurer to include all the necessary logic to support the needed functions.
  • 81. Use the FPGA PROCESSOR Root SMT365_8_1 PROCESSOR F FPGA ATTACH=Root … PLACE mainctrl Root PLACE example2 Root PLACE disp_raw Root PLACE disp_cor Root PLACE UI Root PLACE correlator Root
  • 82. Use the FPGA PROCESSOR Root SMT365_8_1 PROCESSOR F FPGA ATTACH=Root WIRE W1 Root[SDB:0] F[SDB_DEVICE:0] … PLACE mainctrl Root PLACE example2 Root PLACE disp_raw Root PLACE disp_cor Root PLACE UI Root PLACE correlator Root
  • 83. Use the FPGA PROCESSOR Root SMT365_8_1 PROCESSOR F FPGA ATTACH=Root WIRE W1 Root[SDB:0] F[SDB_DEVICE:0] … PLACE mainctrl Root PLACE example2 Root PLACE disp_raw Root PLACE disp_cor Root PLACE UI Root PLACE correlator F
  • 84. FPGA Tasks Placing a task on an FPGA instructs the configurer to look for an FPGA version of the task. This can be written using: VHDL Xilinx System Generator Handel-C (Celoxica) Any other method you like.
  • 85. Building with FPGA The configurer will construct a Xilinx project for the FPGA It will call the Xilinx toold to build a complete bitstream. The bitstream will be included in the single application file. The FPGA will be configured automatically as the application is loaded.
  • 86. Conclusion Diamond does a lot of the work for you. Diamond allows you to change your mind and alter processors and topology. Diamond gives a structured model for developing efficient applications. The Diamond model is the same for any number and any combination of processors: DSP or FPGA. Diamond simplifies developing multiprocessor applications.
  • 87.