NePSim: A Network Processor Simulator
Network processor (NP) is a new breed of microprocessor that integrates a parallel processing design on a single chip for processing complex algorithms, deep packet inspection, traffic management, and packet forwarding at wire speed. It has the advantage of both high performance and programmability, which cannot be achieved at the same time by ASIC or general-purpose CPU. Typical NPs employ parallel processing elements (PEs) with multi-threading technique to keep up with explosive internet packet processing demands. The PEs can be programmed in a parallel or pipelined fashion, based on the nature of the processing tasks.
There is an increasing interest in the NP architecture design for the sake of better performance and energy efficiency. However, there has not been an open-source simulation infrastructure that makes the performance/power tradeoffs in NPs clearly visible to computer architects. NePSim  is the first open source integrated infrastructure for analyzing and quantifying the NP performance/power dissipation at architecture-level. NePSim contains a cycle-accurate simulator for a typical NP architecture (Intel’s IXP series), an automatic verification framework for testing and validation, and a power estimation model for measuring the power consumption of the simulated NP.
Through performance-power study, we observe that NP’s power consumption increases faster than performance. Low power techniques would be critical for future NP designs. We proposed two schemes to reduce power dissipation: dynamic voltage scaling (DVS) and clock gating . DVS exploits the PEs’ utilization variance, reducing voltage and frequency when the processor has low activity and increasing them when the peak processor performance is required. DVS can save up to 17% of power consumption with less than 6% performance loss. Clock gating can be used to turn off a subset of PEs when the packet processing requirement is low, and turn on the PEs when the need is high. Clock gating saves power in coarse granularity, and is particularly useful when the network traffic volume has high variance. With real world network traces, our experiment (Figure 2) showed that clock gating scheme can save power consumption by up to 30% with no packet loss and little impact to the overall throughput.While cycle-accurate simulation tools have been widely used to measure chip performance and power, this approach will be hindered by the increasing simulation complexity of the multi-core multithreading architecture. Due to the specialty of NP applications, the existing simulation acceleration methods cannot be applied to NP simulation without modification. We proposed a new scheme  that uses stratified random sampling to choose a reduced and representative trace input for NP simulation. Our experiments showed that our approach can effectively reduce simulation time by an order of magnitude for seven NP benchmarks, and the error rate is bounded within 3% with 95% confidence.
Figure 1. NePSim software structure. Figure 2. Power saving vs. packet arrival rate using clock gating low power technique on a NP.
 Yan Luo, Jun Yang, Laxmi Bhuyan, Li Zhao, “NePSim: A Network Processor Simulator with Power Evaluation Framework”, IEEE Micro, Sept/Oct 2004
Xi Chen, Yan Luo, Harry Hsieh, Laxmi Bhuyan, F. Balarin, “Utilizing
Assertions for System Design of Network Processor,” Design Automation
Jia Yu, Wei Wu, Xi Chen, Harry Hsieh, Jun Yang, F. Balarin,
Automatic Design Exploration of DVS in Network Processor
Automation and Test in
 Yan Luo, Jia Yu, Jun Yang, and Laxmi Bhuyan, “Low Power Network Processor Design Using Clock Gating,” the 42nd Design Automation Conference (DAC), 2005
 Jia Yu, Jun Yang, Shaojie Chen, Yan Luo and Laxmi Bhuyan, “Enhancing Network Processor Simulation Speed with Statistical Input Sampling,” International Conference on High Performance Embedded Architectures & Compilers (HiPEAC), 2005