C.H.Ho, P.H.W.Leong, W.Luk (Imperial College), Andy Yan, Rebecca Cheng, Steve Wilton (University of British Columbia), Sergio Lopez-Buedo (Universidad Autonoma de Madrid)
Since their introduction in 1985, Field-Programmable Gate Arrays (FPGA's) have seen a phenomenal growth in their ability to implement large complex digital circuits. Originally used primarily for prototyping and small glue logic replacement, FPGA's are now used to implement entire systems containing memory, embedded processors, and other embedded functionality. A 1994 databook quotes a maximum gate count of 25,000; in July 2001, a part that can implement circuits containing six million system gates was announced. The achievable clock frequency has increased over the years as well.
Much of this dramatic improvement has been the result of architectural improvements. There have been numerous academic and industrial investigations including logic block studies, routing architecture studies, and memory block studies. In general, each of these studies considers one or a handful of architectural parameters in isolation, and finds "good" values for those parameters using experimentation. During the experiments, a handful of realistic benchmark circuits are typically fed through a representative CAD tool. Detailed models are then used to measure the area or delay of the circuit, and, based on these results, one of the architectures is deemed "the best".
We have investigated two aspects of this architectural methodology.
First, we consider Virtual Embedded Blocks. Embedded elements, such as block multipliers, are increasingly used in advanced FPGA devices to improve efficiency in speed, area and power consumption. We have developed a methodology for assessing the impact of such embedded elements on efficiency. The methodology involves creating dummy elements, called Virtual Embedded blocks (VEBs), in the FPGA to model the size, position and delay of the embedded elements. The standard design flow offered by FPGA and CAD vendors can be used for mapping, placement, routing and retiming of designs with VEBs. The speed and resource utilisation of the resulting designs can then be inferred using the FPGA vendor's timing analysis tools. We illustrate the application of this methodology to the evaluation of various schemes of involving embedded elements that support floating-point computations.
Second, we consider the sensitivity of these sorts of experiments. Relying on the results of this sort of experimentation is dangerous. No matter how careful a researcher is, assumptions and approximations must be made. In some cases, these assumptions and approximations may affect the results of the experiments, and possibly even change the conclusions of the experiments. Some of these assumptions can be categorized as follows:
CAD Tools: Clearly, the CAD tools employed for the architectural study will have a significant impact on the results. This includes not only placement and routing tools, but also the optimization and technology-mapping algorithms. In some cases, companies will run experiments using a pre-release experimental tool flow. The intention is that the final release software will be similar, but there will likely be some changes, and these changes may affect the architectural results. In academic studies, representative tools, such as Flowmap and VPR are often used to try to make the results as vendor-neutral as possible. Yet, these tools could lead to results that would not be seen had commercial tools been employed.
CAD Tool Settings: Most tools have numerous settings that can be used to guide the optimization algorithms. The documentation that accompanies VPR and T-VPACK has over six pages describing the run-time switches available; many of these switches will significantly affect the results of the optimization, and perhaps the conclusions of architectural experiments.
Experimental Techniques: There are several ways to use a CAD tool to evaluate an architecture. As an example, many researchers allow the number of tracks in each FPGA channel to "float". That is, they find the minimum number of tracks needed in each channel to successfully route a circuit, and use an FPGA with exactly that number (or a fixed multiple of that number) in comparisons. On the other hand, many commercial studies (in which the researchers have a fixed device in mind) assume a fixed number of tracks per channel. Each of these techniques may lead to different results, and perhaps different conclusions. As another example, many experiments are performed assuming the I/O connections to each benchmark circuit can be assigned to any I/O pin; others assume the pin assignment is predetermined and fixed.
Orthogonal Architecture Assumptions: When investigating the effects of one architectural parameter, it is usually necessary to fix several other parameters. As an example, when performing logic block studies, the routing fabric architecture is often fixed. Yet, it is conceivable that later changes in the routing fabric may influence the optimum logic block architecture.
In this project, we have examined the sensitivity of FPGA architectural research to experimental variations. In order to make our study concrete, we focus on four previously-published fundamental FPGA architectural experiments:
For each of these experiments, we investigated how sensitive the conclusions are to experimental variations. It is important to note that we were not setting out to actually answer these questions; they have been answered well in the previous works, and in most cases, the conclusions are well known. Our goal was to determine how sensitive these conclusions are to experimental variations. Also note that it is the conclusions we care about; there are many cases when the raw data changes significantly, but the overall conclusions of the study are the same.