Warp processors are hybrid SoC (system on a chip) devices that dynamically optimize software by sythesizing hardware implementated in an on-chip FPGA. From a software developer's point of view, a warp processor initially executes an application like any other microprocessor, but after some period of time the application transparently executes more efficiently, with improved performance and reduced energy. This transparency allows for synthesis to be integrated into any existing application development tool flow, allowing developers to use their existing languages and compilers. Warp processors completely hide synthesis from software developers, who often avoid hardware design due to the difficult and time-consuming process of register-transfer level specification. Also, the dynamic nature of warp processing enables dynamic optimizations not possible in existing static approaches, such as phase-based optimizations.
To perform synthesis at runtime, warp processors have a
specialized architecture capable of profiling the executing software,
computation kernels, synthesizing the decompiled kernels, and then mapping,
placing, and routing the kernels into an on-chip FPGA. The main challenge in
the design of warp processors is the design of these CAD tools, which must run
in an on-chip environment - a difficult task considering these tools
power workstations. We have currently implemented a complete on-chip CAD tool
flow that executes in just several seconds on an ARM microprocessor, resulting
in a hardware/software system that is often 10x faster than software
execution. We are currently extending warp processors to handle multithreaded
applications, by synthesizing custom accelerators for executing threads. Early
results show that multithreaded warp processing can achieve more than 100x
speedups compared to software execution on multi-core systems with
up to 64 cores.
Much of my research has focused on one of the enabling technologies of warp
processors - synthesis from software binaries. Because the dynamic synthesis performed by warp processors must
be performed on a software binary as opposed to high-level code, the resulting
hardware can potentially be much slower, due
to the loss of high-level information during software compilation. To make
software binaries feasible, I have adapted existing decompilation techniques
and introduced new
techniques to recover high-level information needed for effective synthesis.
By using these techniques, I have shown that for many applications, including
a commercial h.264 decoder,
synthesis from software binaries can in fact achieve similar, or even
identical results compared to high-level synthesis approaches. Synthesis from
software binaries can also be used independently of warp processors, providing
similar transparency advantages for desktop CAD, in addition to supporting
synthesis of library code, legacy code, and hand-optimized assembly.
Greg Stitt Department of Computer Science and Engineering College of Engineering University of California, Riverside Riverside, CA 92521 (909) 787-2373 email@example.com[Publications] [Curriculum Vitae] [Music]