That is the question! One of the key decisions in embedded systems hardware/software co-design is hardware and software partitioning, that is, which functions go in hardware and which one go in software. This interesting article from Programmable Logic DesignLine talks about this matter …
Programmable Logic DesignLine > Design Center
Many of the implementations that we use to today are the way they are because of 40 years of von Neumann programming and thinking.
By Geno Valente, XtremeData
Today’s systems architects have a tough enough job solving difficult architectural problems for applications like 40G line cards, HD Video Transcoding Systems, and next generation RADAR applications. However, the most difficult part of their job is that they also need to draw the difficult line between hardware and software; keeping manpower requirements in the equation, costs in check, and architecting a solution that can be built inside the market required timeframe. This is all in a day’s work for some, but an almost impossible task for many.
For 2008, the industry buzzwords are “Hardware Acceleration”. CPU vendors are integrating custom integrated IP into their chips. AMD and Intel are creating ecosystems named Torenzza and QuickAssist for third party accelerators. GPU vendors are setting their sights on general-purpose functionality. Meanwhile, a host of other chip companies, too many to mention, are developing new products that target this High Performance Computing (HPC) market.
A simplistic and very common place for designers to start is to profile your C/C++ code, find the routines that take most of your clock cycles, and start your efforts to increase performance or remove bottlenecks there.
So I think it’s worth reading for my new job in Design Space Exploration for MpSoCs, which also use this hardware accelerators implemented in third party IP Cores.
In the article the author cites some typical applications that can be parallelized beyond a factor of 10x improvement by accelerators:
- Filters – FIR, IIR, Poly-Phase
- Fast-Fourier Transforms (FFT)
- Encryption – AES, TDES, DES, etc.
- Video Transcoding – MPEG2, H.264, VC-1, and others
- Compression – ZLIB, GZIP, etc.
- Bioinformatics – Smith Waterman, BLAST, ClustalW
- Random Number Generation (RNG) – SOBOL and Mersenne Twister for Monte-Carlo
- Medical Imagining – CT Back Projection
- Packet and Network Processing (IPv6, Deep Packet Inspection)
- Market Data – FIX, FAST FIX, OPRA, etc.
So we want to find these algorithms which are bound to hardware acceleration in our code, we must look for hints like these:
- Bit Level Processing with deep instruction pipelines.
- Vector Based Processing of large amounts of data.
both of them are trying to analyze data in ways that are not the standard 32 bit or 64 bit instruction, which creates overheard for the CPU or GPU because it has a fixed data and instruction set size.
It also interesting the author’s last reflection about rethinking the code in this new multicore world:
The industry is beginning to realize that many of the implementations that we use to today are the way they are because of 40 years of von Neumann programming and thinking. Since college (or even earlier) many of today’s programmers have been taught, trained, and lead into thinking that a serial approach is the best and only approach to every problem.
Yes, I think we must change our minds and move away from serial …
Finally, to develop FPGA-based acceleration, being that the mainstream way to do hardware acceleration there are vendor independent approaches like openFPGA that musbt be taken into account.