ACM Journal on

Emerging Technologies in Computing (JETC)

Latest Articles

Fully Exploiting PCM Write Capacity Within Near Zero Cost Through Segment-Based Page Allocation

Improving the endurance of phase change memory (PCM) is a fundamental issue when PCM technology is... (more)

Reducing System Power Consumption Using Check-Pointing on Nonvolatile Embedded Magnetic Random Access Memories

The most widely used embedded memory technology, static random access memory (SRAM), is heading toward scaling problems in advanced technology nodes... (more)

Rethinking Computer Architectures and Software Systems for Phase-Change Memory

With dramatic growth of data and rapid enhancement of computing powers, data accesses become the bottleneck restricting overall performance of a... (more)

Reversible Synthesis of Symmetric Functions with a Simple Regular Structure and Easy Testability

In this article, we introduce a novel method of synthesizing symmetric Boolean functions with... (more)

Neuromorphic Processors with Memristive Synapses

Due to their nonvolatile nature, excellent scalability, and high density, memristive nanodevices provide a promising solution for low-cost on-chip storage. Integrating memristor-based synaptic crossbars into digital neuromorphic processors (DNPs) may facilitate efficient realization of brain-inspired computing. This article investigates... (more)

Impact of Fin Width Scaling on RF/Analog Performance of Junctionless Accumulation-Mode Bulk FinFET

In this article, the RF and analog performance of junctionless accumulation-mode bulk FinFETs is... (more)

Area Minimization Synthesis for Reconfigurable Single-Electron Transistor Arrays with Fabrication Constraints

Power dissipation has become a pressing issue of concern in the designs of most electronic system as... (more)

Comparative Area and Parasitics Analysis in FinFET and Heterojunction Vertical TFET Standard Cells

Vertical tunnel field-effect transistors (VTFETs) have been extensively explored to overcome the scaling limits and to improve on-current (ION)... (more)

Designing a Million-Qubit Quantum Computer Using a Resource Performance Simulator

The optimal design of a fault-tolerant quantum computer involves finding an appropriate balance between the burden of large-scale integration of noisy... (more)

Quantum-Logic Synthesis of Hermitian Gates

In this article, the problem of synthesizing a general Hermitian quantum gate into a set of primary quantum gates is addressed. To this end, an extended version of the Jacobi approach for calculating the eigenvalues of Hermitian matrices in linear algebra is considered as the basis of the proposed synthesis method. The quantum circuit synthesis... (more)

Embedding of Large Boolean Functions for Reversible Logic

Reversible logic represents the basis for many emerging technologies and has recently been intensively studied. However, most of the Boolean functions... (more)


New JETC Editor-in-Chief

The Journal of Emerging Technologies in Computing Systems is happy to welcome Prof. Yuan Xie (University of California at Santa Barbara as the incoming Editor in Chief! We are also grateful to Prof. Krish Chakrabarty for serving as Editor in Chief for the last six years, and would like to wish to both all the best in their future!

New options for ACM authors to manage rights and permissions for their work

ACM introduces a new publishing license agreement, an updated copyright transfer agreement, and a new author-pays option which allows for perpetual open access through the ACM Digital Library. For more information, visit the ACM Author Rights webpage.

About JETC


The Journal of Emerging Technologies in Computing Systems invites submissions of original technical papers describing research and development in emerging technologies in computing systems. Major economic and technical challenges are expected to impede the continued scaling of semiconductor devices. This has resulted in the search for alternate mechanical, biological/biochemical, nanoscale electronic, asynchronous and quantum computing and sensor technologies. 

read more

Spintronics: Emerging Ultra-Low Power Circuits and Systems beyond MOS Technology

Shielding STT-RAM Based Register files on GPUs Against Read Disturbance

Emerging Spin-Transfer Torque (STTRAM) memory technology has been intensively studied to build GPU register files for better energy efficiency, thanks to its benefits of low leakage power, high density, and good scalability. However, STT-RAM suffers from the read disturbance issue, which stems from the fact that the voltage difference between read current and write current becomes smaller as technology scales. The read disturbance leads to high error rates for read operations, which cannot be effectively protected by the SECDEC ECC on large-capacity register files of GPUs. To combat the read disturbance, we propose a novel software-hardware co-designed solution, i.e. Red-Shield, which consists of three optimizations to overcome the limitations of the existing solutions. First, we identify dead reads at compiling stage and augment instructions to avoid unnecessary restores. Second, we employ a small read buffer to accommodate register reads with high access locality to further reduce restores. Third, we propose an adaptive restore mechanism to selectively pick the suitable restore scheme, according to the busy status of corresponding register banks. Experimental results show that our proposed design can effectively mitigate the performance loss and energy overhead caused by restore operations, while still maintaining the reliability of reads.

Alleviate Pin Constraints for Multi-Core Processors Through On/Off-Chip Power Delivery System Design

The number of chip pins is limited due to the cost and reliability issues of sophisticated packages. It is predicted chip pins will be overstretched to satisfy the requirements of power delivery and memory access. The gap will increase as technology scales, due to the increasing computation resources and supply current. Pin reduction techniques are required for continued computing performance growth. In this paper, we propose a chip pin constraint alleviation strategy through on/off-chip power delivery system co-design to effectively reduce the demand for power pins. An analytical model of power delivery system, consisting of on/off-chip regulators and power delivery network, is proposed to evaluate the influence of regulator design and package conduction loss. By combining it with multi-core processor model of performance and memory bandwidth, we characterize the entire system to investigate the chip pin constraint in multi-core processor scaling, and the effectiveness of our strategy. Our strategy achieves a significant pin count reduction, e.g. 31.3% at 8nm technology node. While provided with the same chip pin count, it is able to improve 35.0% chip performance at 8nm, compared to the conventional design. For real applications of different parallelism, our strategy outperforms counterparts with 16.8% performance improvement at 8nm.

A Fault Tolerant Ripple-Carry Adder with Controllable-Polarity Transistors

This paper deals with the effects of faults on circuits implemented with controllable-polarity transistors. We propose a new fault model that suits the characteristics of these devices, and report the results of a SPICE-based analysis of the effects of faults on the behavior of some basic gates implemented with them. Hence, we show that the considered devices are able to intrinsically tolerate a rather high number of faults. We finally exploit this property to build a robust and scalable adder which is shown to tolerate all single faults and more than 99.5% of the double faults. Its area, performance and leakage power characteristics are improved by 15%, 18% and 12%, respectively, when compared to an equivalent FinFET solution at 22-nm technology node.

Mobile Unified Memory-Storage Structure based on Hybrid Non-Volatile Memories

In mobile computing systems, the limited amount of main memory space leads to page swap operation overhead and data duplication in both main memory and secondary storage. Furthermore, SQLite write operations in mobile devices such as smartphones and tablet PCs tend to frequently overwrite data to storage, significantly degrading performance. Thus, this paper presents a unified memory-storage structure that is optimized for mobile devices and blurs the boundary between the existing main memory layer and secondary storage layer. The unified memory-storage structure consists of a Dynamic RAM (DRAM) based dual buffering module, hybrid unified memory-storage array consisting of DRAM, a SLC/MLC hybrid 3D cross point array, and NAND Flash memory, and an associated unified storage translation layer devised for the memory address and file translation mechanism as a system software module. This hybrid array of non-volatile memories is formed as a single memory-disk integrated storage space that can be logically divided into static and dynamic spaces. Experimental results show that the overall performance of the hybrid unified memory-storage system with the buffering structure increases by around 59% and power consumption is also improved by 30%, compared to previous integrated memory-disk system.


Spin Transfer Torque MRAMs are attractive due to their non-volatility, high density and zero leakage. However, STT-MRAMs suffer from poor reliability due to shared read and write paths. Additionally, conflicting requirements for data retention and write-ability (both related to the energy barrier height of the magnet) makes design more challenging. In order to address poor reliability of STT-MRAMs, usage of Error Correcting Codes (ECC) have been proposed. Unlike traditional CMOS memory technologies, ECC is expected to correct both soft and hard errors in STT_MRAMs. To achieve acceptable yield with low write power, stronger ECC is required, resulting in increased number of encoded bits and degraded memory efficiency. In this paper, we propose Failure aware ECC (FaECC), which masks permanent faults while maintaining the same correction capability for soft errors without increased encoded bits. Furthermore, we analyze the impact of process variations on the life-time of the free layer and retention failures. Further, we developed a cross-layer simulation framework that consists of device, circuit and array level analysis of STT-MRAM memory arrays. Our results show that using FaECC relaxes the requirements on the energy barrier height, which reduces the write energy and results in smaller access transistor size and memory array area.

Distributed In-Memory Computing on Binary RRAM Crossbar

The recent emerging resistive random-access memory (RRAM) can provide non-volatile memory storage but also intrinsic computing for matrix-vector multiplication, which is ideal for low-power and high-throughput data analytics accelerator performed in memory. However, the existing RRAM-crossbar based computing is mainly assumed as a multi-level analog computing, whose result is sensitive to process nonuniformity as well as additional overhead from AD-conversion and I/O. In this paper, we explore the matrix-vector multiplication accelerator on a binary RRAM-crossbar with adaptive 1-bit-comparator based parallel conversion. Moreover, a distributed in-memory computing architecture is also developed with according control protocol. Both memory array and logic accelerator are implemented on the binary RRAM-crossbar, where logic-memory pair can be distributed with protocol of control bus. Experiment results have shown that compared to the analog RRAM-crossbar, the proposed binary RRAM-crossbar can achieve significant area-saving with better calculation accuracy. Moreover, significant speedup can be achieved for matrix-vector multiplication in the neuron-network based machine learning such that the overall training and testing time can be both reduced respectively. In addition, large energy saving can be also achieved when compared to the traditional CMOS-based out-of-memory computing architecture.

Energy-Efficient and Improved Image Recognition with Conditional Deep Learning

Deep learning networks have proven to be very successful for a wide range of recognition tasks across modern computing platforms. However, the computational requirements for such deep nets can be quite high, and hence their energy-efficient implementation is of great interest. Although traditionally the entire network is utilized for the recognition of all inputs, we observe that classification difficulty varies widely across inputs in real-world datasets; only a small fraction of inputs require the full computational effort of a network, while a large majority can be classified correctly with very low effort. In this paper, we propose Conditional Deep Learning (CDL) where convolutional layer features are used to identify the variability in the difficulty of input instances and conditionally activate the deeper layers of the network. The proposed methodology enables the network to dynamically adjust the computational effort depending upon the difficulty of the input data yielding substantial energy savings on MNIST/CIFAR10 datasets. We further employ the conditional approach to train deep learning networks from scratch with integrated supervision from the additional output neurons appended at the convolutional layers. Our proposed integrated CDL training leads to an improvement in the gradient convergence behavior giving substantial error rate reduction on MNIST/CIFAR-10.

Power-Utility-Driven Write Management for MLC PCM

Phase change memory is a promising alternative to DRAM as main memory due to its merits of high density and low leakage power. The Multi-level Cell PCM reveals more attractions than Single-level Cell PCM because it can store multiple bits per cell to achieve higher density. With the iterative write technique, MLC writes demand higher power than DRAM writes, but the power supply of MLC system is similar to that of DRAM. The incompatibility of high write power and limited power budget results in the degradation of the write throughput and performance. In this work, we investigate both write scheduling policy and power management to improve the MLC power utility and alleviate the negative impacts. We identify the power-utility-driven write scheduling as an online bin-packing problem and then derive a power-utility-driven scheduling (PUDS) policy from the First-Fit algorithm to improve the write power usage. Based on the SET ramp-down pulse characteristic, we propose the SET Power Amortization (SPA) policy which proactively reclaims the power tokens at intra-SET level to promote the power utilization. Our results demonstrate that the system with PUDS+SPA has a 60% increase of performance and 36% improvement of the power utility over the state-of-the-art power management technique.

Design of approximate compressors for multiplication

Approximate computing has recently developed as a promising technique for energy efficient VLSI system design and also best suited for error resilient applications, such as signal processing and multimedia. Approximate computing reduces accuracy, but it still provides significant and faster results with usually lower power consumption. This is mostly attractive for arithmetic circuits. In this paper, various novel design approaches of approximate 4-2 and 5-2 Compressors are proposed for reduction of the partial products stages during multiplication. Three approximate 8x8 Dadda multiplier designs using a novel three 4-2 approximate compressors and also two approximate 8x8 Dadda multiplier designs using a novel 5-2 approximate Compressors are proposed. Extensive simulation results show that the proposed designs achieve significant accuracy improvement together with power and delay reductions compared to previous approximate designs.

Non-volatile processor based on MRAM for ultra-low-power IoT devices

Over the last few years, a new era of smart connected devices has emerged in the market to enable the future world of the internet of things (IoT). A key requirement for IoT applications is the power consumption to allow very high autonomy in the case of battery-powered systems. Depending on the application, such devices will be most of the time in a low-power mode (sleep mode), and will wake up only when there is a task to accomplish (active mode). Emerging non-volatile memory technologies are seen as a very attractive solution to design ultra-low-power systems. Among these technologies, Magnetic Random Access Memory (MRAM) is a promising candidate as it combines non-volatility, high density, reasonable latency and low leakage. Integration of non-volatility as a new feature of memories has the great potential to allow full data retention after a complete shutdown with a fast wake-up time. This paper explores the benefits of having a non-volatile processor to enable ultra-low-power IoT devices.

Survey of STT-MRAM Cell Design Strategies: Taxonomy and Sense Amplifier Tradeoffs for Resiliency

Spin-Transfer Torque Random Access Memory (STT-MRAM) has been explored as a post-CMOS technology for embedded and data storage applications seeking non-volatility, near-zero standby energy, and high density. Towards attaining these objectives for practical implementations, various techniques to mitigate the specific reliability challenges associated with STT-MRAM elements are surveyed, classified, and assessed herein. Cost and suitability metrics assessed include the area of nanomagmetic and CMOS components per bit, access time and complexity, sense margin, and energy or power consumption costs versus resiliency benefits. Solutions to the reliability issues identified are addressed within a taxonomy created to categorize the current and future approaches to reliable STT-MRAM designs. A variety of destructive and non-destructive sensing schemes are assessed for process variation tolerance, read disturbance reduction, sense margin, and write polarization asymmetry compensation. The highest resiliency strategies deliver a sensing margin above 300 mV while incurring low power and energy consumption on the order of picojoules and microwatts, respectively, while attaining read sense latency of a few nanoseconds down to hundreds of picoseconds for non-destructive and destructive sensing schemes, respectively. Additional Key Words and Phrases: Spin-Transfer Torque storage elements, STT-MRAM, Magnetic Tunnel Junction (MTJ), Self-referencing schemes, Reliability, Process Variation, Read/Write Reliability

Structured Pruning of Deep Convolutional Neural Networks

Real time application of deep learning algorithms is often hindered by high computational complexity and frequent memory accesses. Network pruning is a promising technique to solve this problem. However, pruning usually results in irregular network connections that not only demand extra representation efforts but also do not fit well on parallel computation. We introduce structured sparsity at various scales for convolutional neural networks, which are feature map wise, kernel wise and intra kernel strided sparsity. This structured sparsity is very advantageous for direct computational resource savings on embedded computers, parallel computing environments and hardware based systems. To decide the importance of network connections and paths, the proposed method uses a particle filtering approach. The importance weight of each particle is assigned by assessing the misclassification rate with corresponding connectivity pattern. The pruned network is re-trained to compensate for the losses due to pruning. While implementing convolutions as matrix products, we particularly show that intra kernel strided sparsity with a simple constraint can significantly reduce the size of the kernel and feature map tensors. The proposed work shows that when pruning granularities are applied in combinations, we can prune the network by more than 70% with less than 1% loss.

Memory-Centric Reconfigurable Accelerator for Classification and Machine Learning Applications

Big Data refers to the growing challenge of turning massive, often unstructured datasets into meaningful, actionable data. As datasets grow from petabytes to exabytes and beyond, it becomes increasingly difficult to run advanced analytics, especially machine learning, in a reasonable time and on a practical power budget. Previous work has focused on accelerating analytics implemented as SQL queries on data-parallel platforms with off-the-shelf CPUs and GPGPUs. However, these systems are general-purpose, and still require a vast amount of data transfer between storage and computing elements, limiting system efficiency. Instead, we present a reconfigurable, memory-centric accelerator which operates at the last level of memory, dramatically reducing the energy required for data transfer and processing of machine learning applications. We functionally validate the framework using a hardware emulation platform and three representative applications: Naive Bayesian Classification, Convolutional Neural Networks, and k-Means Clustering. Results are compared with implementations on a modern CPU and GPU. Finally, the use of in-memory dataset decompression to further reduce data transfer volume is investigated. The system achieves an average energy efficiency improvement of 74x and 212x over GPU and single-threaded CPU, respectively, while dataset compression is shown to improve overall efficiency by an additional 1.8x on average.

High-Performance Computing with Quantum Processing Units

The prospects of quantum computing have driven efforts to realize fully functional quantum processing units (QPUs). Recent success in developing proof-of-principle QPUs has prompted the question of how to integrate these emerging processors into modern high-performance computing (HPC) systems. We examine how QPUs can be integrated into current and future HPC system architectures by accounting for functional and physical design requirements. We identify two integration pathways that are differentiated by infrastructure constraints on the QPU and the use cases expected for the HPC system. This includes a tight integration that assumes infrastructure bottlenecks can be overcome as well as a loose integration that assumes they cannot. We find that the performance of both approaches is likely to depend on the quantum interconnect that serves to entangle multiple QPUs. We also identify several challenges in assessing QPU performance for HPC, and we consider new metrics that capture the interplay between system architecture and the quantum parallelism underlying computational performance.

Exploiting Idle Hardware to provide Low Overhead Fault Tolerance for VLIW Processors

Because of technology scaling, the soft error rate has been increasing in digital circuits, which affects system reliability. Therefore, modern processors, including VLIW architectures, must have means to mitigate such effects to guarantee reliable computing. In this scenario, our work proposes three low overhead fault tolerance approaches based on instruction duplication with zero latency detection, which uses a rollback mechanism to correct soft errors in the pipelanes of a configurable VLIW processor. The first uses idle issue slots within a period of time to execute extra instructions considering distinct application phases. The second works at a finer grain, adaptively exploiting idle functional units at run-time. However, some applications present high ILP (instruction-level parallelism), and so the ability to provide fault tolerance is reduced: less functional units will be idle, decreasing the number of potential duplicated instructions. The third approach attacks this issue by dynamically reducing ILP according to a configurable threshold, increasing fault tolerance at the cost of performance. While the first two approaches achieve significant fault coverage with minimal area and power overhead for applications with low ILP, the latter improves fault tolerance with low performance degradation. All approaches are evaluated considering area, performance, power dissipation, and error coverage.

SPARCNet: A Hardware Accelerator for Efficient Deployment of Sparse Convolutional Networks

Deep neural networks have been shown to outperform prior state-of-the-art solutions that often relied heavily on hand-engineered feature extraction techniques coupled with simple classification algorithms. In particular, deep convolutional neural networks have been shown to dominate on several popular public benchmarks such as ImageNet database. Unfortunately, the benefits of deep networks have yet to be fully exploited in embedded, resource-bound settings that have strict power and area budgets. In order to reduce power and area while still achieving required throughput, classification-efficient network architectures are required in addition to optimal deployment on efficient hardware. In this work, we target both of these enterprises. For the first objective, we analyze simple, biologically-inspired reduction strategies that are applied both before and after training. The central theme of the techniques is the introduction of sparsification to help dissolve away the dense connectivity that is often found at different levels in convolutional networks. In the second contribution, we propose SPARCNet: a hardware accelerator for efficient deployment of SPARse Convolutional NETworks. The accelerator looks to enable deploying networks in such resource-bound settings by exploiting efficient forms of parallelism and the proposed sparsification techniques.

A Fine-Grained, Uniform, Energy-Efficient Delay Element for 2-Phase Bundled-Data Circuits

Contemporary digitally controlled delay elements trade off power overheads and delay quantization error. This paper proposes a new programmable delay element that provides a balanced design that yields low power with moderate delay quantization error even under process, voltage, and temperature variations. The element employs and leverages the advantages offered by a 28nm FD-SOI technology, using back body biasing to add an extra dimension to its programmability. To do so, a novel generic delay shift block is proposed, which enables incorporating both fine and coarse delays in a single delay element that can be easily integrated into digital systems, an advantage over hybrid delay elements that rely on analog design.

Monolayer Transistor SRAMs: Towards Low-Power, Denser Memory Systems

Monolayer heterojunction FETs based on vertical transition metal dichalcogenides and planar black phosphorus FETs (BPFETs) have demonstrated excellent subthreshold swing, high ION/IOFF, and high scalability, making them attractive candidates for post-CMOS memory design. This paper explores TMDCFET and BPFET SRAM design by combining atomistic self-consistent device modeling with SRAM circuit design and simulation. We perform detailed evaluations of the TMDCFET/BPFET SRAMs at a single bitcell and at SRAM array level. Our simulations show that at low operating voltages, TMDCFET/BPFET SRAMs exhibit significant advantages in static power, dynamic read/write noise margin, and read/write delay over nominal 16nm CMOS SRAMs at both bitcell and array level implementations.

Sketching Computation with Stochastic Processing Engines

In conventional embedded computing, a sudden shortage of computing resource, such as premature termi-nation or power outage, often results a complete computing failure and produces totally unusable results.To circumvent this challenge, we present a novel technique that allows reconfigurable computing to achieve quality scalability by leveraging probabilistic principle. Our objective is to maximize the quality and us-ability of final results even under sudden change of computing resource.This paper explores how to leverage stochastic principle to gracefully salvage partially finished results of embedded computing. Our work is inspired by the concept of incremental sketching frequently found in artistic rendering, where the drawing procedure consists of a series of steps, each gradually improving the quality of results. The essence of our approach is to encode the input signal as the probability density function, perform stochastic computing operations on the signal in the probabilistic domain, and decode the output signal by estimating the probability density function of the resulting random samples.To validate our proposed architecture design, we have implemented a proof-of-concept probabilistic convolver with a Virtex 6FPGA device. Finally, we use three convolution-based image processing applications, image correspondence,image sharpening, and edge detection, to demonstrate that important embedded computing applications can indeed be sketched in a graceful manner.

A Survey of Techniques for Architecting Processor Components using Domain Wall Memory

Recent trends of increasing core-count and bandwidth/memory-wall have motivated the researchers to explore novel memory technologies for designing processor components such as cache, register file, shared memory, etc. Domain wall memory (DWM), also known as racetrack memory, is a promising emerging technology due to its non-volatility and very high density. However, use of DWM presents challenges due to characteristics of both DWM itself (e.g., requirement of shift operations, variable latency) and processor components. Recently, several techniques have been proposed to address these challenges. This paper presents a survey of architectural techniques for using DWM for designing components in both CPU and GPU. We discuss techniques related to performance, energy and reliability and also discuss works which compare DWM with other memory technologies. We also highlight the opportunities and obstacles in using DWM for designing processor components. This survey is expected to spark further research in this area and be useful for researchers, chip designers and computer architects.

Optimized standard cells for all-spin logic

All-Spin Logic (ASL) devices provide a promising spintronics-based alternative for Boolean logic implemen- tations in the post-CMOS era. In principle, any logic functionality can be implemented in ASL. In practice, the performance of an ASL gate is significantly affected by layout choices, but such implications have not been adequately explored in the past. This paper proposes a systematic approach for building standard cells in ASL, which are a basic building block in an overall design methodology for implementing large ASL- based circuits. We first propose a new technique to reduce the magnet count for an ASL majority gate but still ensure correct functioning through layout optimization methods. Building upon physics-based analysis, we then build a standard cell library with diverse functionality and characterize the library for delay, en- ergy and area. We perform delay-optimized technology mapping on ISCAS85 benchmark circuits using our library. Our approach results in circuits that are 19.69% faster, consume 17.77% less energy and are 33.56% more area efficient compared to a standard cell library that does not incorporate layout-based optimization techniques of our work.

Energy Neutral Design Framework for Supercapacitor-based Autonomous Wireless Sensor Networks

To design autonomous Wireless Sensor Networks (WSNs) with a theoretical infinite lifetime, energy harvesting (EH) techniques have been recently considered as promising approaches. In this paper, an efficient energy harvesting system compatible with various environmental sources such as light, heat or wind energy is proposed. Our platform takes advantage of double-level capacitors not only to prolong the system lifetime but also to enable robust booting from the exhausting energy of the system. Simulations and experiments show that our Multiple Energy Sources Converter (MESC) can achieve booting time in order of seconds. Although capacitors have virtual recharge cycles, they suffer from higher leakage compared to rechargeable batteries. Increasing their size can decrease the system Quality of Service (QoS) due to leakage energy. Therefore, an energy neutral design framework providing a methodology to determine the minimum size of the storage devices satisfying Energy Neutral Operation (ENO) and maximizing QoS in EH nodes when using a given energy source is proposed. Experiments validating this framework are performed on a real WSN platform with both photovoltaic cells and thermal generators in an indoor environment. Moreover, simulations on OMNET++ show that the energy storage optimized from our design framework is utilized up to 93.86%.

Towards Human-Scale Brain Computing Using 3D Wafer Scale Integration

Stochastic CBRAM based Neuromorphic Time Series Prediction System

In this research, we present a CBRAM (conductive-bridge RAM) based neuromorphic system which efficiently addresses time series prediction. We propose a new (i) voltage-mode stochastic multi-weight synapse circuit based on experimental bi-stable CBRAM devices, (ii) a voltage-mode neuron circuit based on the concept of charge sharing, and (iii) an optimized training methodology powered by a stochastic implementation of the least-mean-squares (SLMS) training rule. To validate the proposed design, we use time series prediction for short-term electrical load forecasting in smart grids. Our system is able to forecast hourly electrical loads with a mean accuracy of 96%, an estimated power dissipation of 15 µW and area of 14.5 µm2 at 65 nm CMOS technology

Computing Polynomials using Unipolar Stochastic Logic

This paper addresses subtraction and polynomial computations using unipolar stochastic logic. Stochastic computing requires simple logic gates and stochastic logic based circuits are inherently fault-tolerant. While it is easy to realize multiplication and scaled addition, implementation of subtraction is non-trivial using unipolar stochastic logic. Additionally, an accurate computation of subtraction is critical for the implementation of polynomials with negative coefficients in stochastic unipolar representation. This paper, for the first time, demonstrates that instead of using well-known Bernstein polynomials, stochastic computation of polynomials can be implemented by using a stochastic subtractor and factorization. Three major contributions are made in this paper. First, two approaches are proposed to compute subtraction in stochastic unipolar representation. In the first approach, the subtraction operation is approximated by cascading multi-levels of OR and AND gates. In the second approach, the stochastic subtraction is implemented using a multiplexer and a stochastic divider. Second, computation of polynomials in stochastic unipolar format is presented using scaled addition and proposed stochastic subtraction. Third, we propose stochastic computation of polynomials using factorization. From experimental results, it is shown that the proposed stochastic logic circuits require less hardware complexity than the previous stochastic polynomial implementation using Bernstein polynomials.

System-Level Design to Detect Fault Injection Attacks on Embedded Real-Time Applications

Fault injection attack has become a critical threat in security-critical embedded system for a long time, while existing researches ignore to address the problem from a system-level perspective. This paper presents an approach to the synthesis of secure real-time applications mapped on distributed embedded systems, which focuses on preventing fault injection attacks. We utilize symmetric cryptographic service to protect confidentiality, and deploy fault detection within confidential algorithm to resist fault injection attacks. Several fault detection schemes are identified, and their fault coverage rates and time overheads are derived and measured, respectively. Our synthesis approach makes efforts to determine the best fault detection schemes for the encryption/decryption of messages, such that the overall security strength of detecting fault injection attack is maximized, and the deadline constraint of the real-time applications is guaranteed. Since addressing the problem is still a NP-hard problem, we propose an efficient algorithm based on Fruit fly Optimization Algorithm (FOA), which can achieve better results by lower time overheads, compared with simulated annealing algorithm. Extensive experiments and a real-life application evaluation demonstrate the superiority of our approach.

Ultra-low-leakage, Robust FinFET SRAM Design Using Multi-parameter Asymmetric FinFETs

Memory arrays consisting of static random access memory (SRAM) cells occupy the largest area on chip and are responsible for significant leakage power consumption in modern microprocessors. With the transition from planar CMOS technology to FinFETs, FinFET SRAM design has become important. However, increasing leakage power consumption of FinFETs due to aggressive scaling, width quantization, read-write conflict, and process variations make FinFET SRAM design challenging. In this paper, we show how multi-parameter asymmetric (MPA) FinFETs can be used to design ultra-low-leakage and robust 6T SRAM cells. We combine multiple asymmetries, namely asymmetry in gate workfunction, source/drain doping concentration, and gate underlap, to address various SRAM design issues all at once. We propose five novel MPA FinFET-based SRAM cell designs and compare them with symmetric and single-parameter asymmetric (SPA) FinFET-based SRAM cells using dc and transient metrics. We show that the leakage current of MPA FinFET-based SRAM cells can be reduced by up to 58x while ensuring reasonable read/write stability metrics. In addition, high stability metrics can be achieved with 22x leakage current reduction compared to the traditional symmetric FinFET-based SRAM cell. There is no area overhead associated with MPA FinFET-based SRAM cells.

PPU: A Control Error-Tolerant Processor for Streaming Applications with Formal Guarantees

Current error-tolerant processors allow errors in the computation, and are positioned to be suitable for error-tolerant applications such as media applications. For such processors, the Instruction-Set-Architecture (ISA) no longer serves as a specification, since it is acceptable for the processor to allow for errors during the execution of instructions. In this work, we address this specification gap by defining the minimal requirements that are needed in order for an error-tolerant processor to provide useful results. Further, we formally define properties that capture these requirements. Based on this, we propose YMM, an error-tolerant processor that aims to meet these requirements with low-cost microarchitectural support. These protection mechanisms convert potentially fatal control errors to potentially tolerable data errors instead of ensuring instruction-level or byte-level correctness. The protection mechanisms in YMM protect the system against crashes, unresponsiveness, and external device corruption. In addition, they also provide support for achieving acceptable result quality. Additionally, we provide a methodology that formally proves the specification properties on YMM using model checking. This methodology uses models for the hardware and software that are integrated with the fault and recovery models. Finally, we experimentally demonstrate the results of model checking and the application-level quality of results for YMM.

A Simplified Phase Model for Simulation of Oscillator Based Computing Systems

Building oscillator based computing systems with emerging nano-device technologies has become a promising solution for unconventional computing tasks like computer vision and pattern recognition. However, simulation and analysis of these systems is both time and compute intensive due to the non-linearity of new devices and the complex behavior of coupled oscillators. In order to speed up the simulation of coupled oscillator systems, we propose a simplified phase model to perform phase and frequency synchronization prediction based on a synthesis of earlier models. Our model can predict the frequency locking behavior with several orders of magnitude speedup compared to direct evaluation, enabling the effective and efficient simulation of the large numbers of oscillators required for practical computing systems.

One-step Sneak-path Free Read Scheme for Resistive Crossbar Memory

A one-step sneak-path free read scheme for resistive crossbar memory is proposed in this paper. During read operation, it configures the crossbar memory array into a 4-terminal resistance network, which is comprised of the resistor of selected cell and three other resistors corresponding to the unselected cells that contribute to the sneak-path. Two sensing voltages with equal potential are applied to three terminals of the network. One is for sensing the resistance of the selected memory cell; the other is for creating zero voltage drop across one of the three resistors, which connects the sneak-path to the selected cell. This effectively suppresses the current injected by the sneak-path to the selected cell sensing loop. This work also proposes a cost-effective data encoding circuit that guarantees at lease half of the memory cells are in high-resistance state, which further minimizes sneak-path current. The impact of key design parameters, such as sensing voltage, switch on-resistance, and the ratio of memory cell resistances in different states, as well as non-ideal effect, e.g. amplifier offset voltage, are investigated. Equations for estimating the maximum size of crossbar array to share a single read circuit are derived. The effectiveness of the proposed design has been validated and studied via circuit simulations.

Redesign the Memory Allocator for Non-Volatile Main Memory

The non-volatile memoryNVM has the merits of byte-addressability, fast speed, persistency and low power consumption, which make it attractive to be used as main memory. Commonly, user process dynamically acquires memory through memory allocators. However, traditional memory allocators designed with in-place data writes are not appropriate for non-volatile main memoryNVRAM due to the limited endurance. In this paper, first, we quantitatively analyze the wear-oblivious of DRAM-oriented designed allocatorglibc malloc and the inefficiency of wear-conscious allocatorNVMalloc. Then, we propose WAlloc, an efficient wear-aware manual memory allocator designed for NVRAM: (1) decouples metadata and data management; (2) distinguishes metadata with volatility; (3) redirects the data writes around to achieve wear-leveling; (4) redesigns an efficient and effective NVM copy mechanism, bypassing the CPU cache and prefetching data explicitly. Finally, experimental results show that the wear-leveling of WAlloc outperforms that of NVMalloc about 30% and 60% under random workloads and well-distributed workloads, respectively. Besides, WAlloc reduces average data memory writes in 64 bytes block by an average of 1.5X comparing with glibc malloc. With the fulfillment of data persistency, cache bypassing NVM copy is better than clflushing NVM copy with performance of circa 14% improvement.

Frontside vs Backside Laser Injection: a Comparative Study

Laser injection can be used to disturb the circuits computation and retrieve the secret (fault attacks). The laser can illuminate the circuit either from its frontside (i.e., where metal interconnections are first encountered) or from the backside (i.e., through the substrate). Historically, frontside injection was preferred because it does not require the die to be thinned. Nevertheless, due to the increasing integration of metal layers in modern technologies, frontside injections do not allow anymore to target any desired location. Indeed, metal lines act like mirrors and they reflect and refract most of the energy provided by the laser beam. Conversely, backside injections, while being more difficult to set up, allow increasing the resolution of the target location and remove the drawbacks of frontside technique. This paper compares experimental results from frontside and backside fault injections. We will show that, conversely to what is generally assumed, frontside injection can provide even better results compared to backside injection, especially for low-cost beams with large laser spot.

VLSI Architecture for the Restricted Boltzmann Machine

Neural network (NN) systems are widely used in many important applications ranging from computer vision to speech recognition. To date, most NN systems are processed by general processing units like CPUs or GPUs. However, as the sizes of dataset and network rapidly increase, the original software implementations suffer from long training time. To overcome this problem, specialized hardware accelerators are needed to design high-speed NN systems. This paper presents an efficient hardware architecture of restricted Boltzmann machine (RBM) that is an important category of NN systems. Various optimization approaches at hardware level are performed to improve the training speed. As-soon-as-possible and overlapped-scheduling approaches are used to reduce the latency. It is shown that, compared with the flat design, the proposed RBM architecture can achieve 50% reduction in training time. In addition, an on-the-fly computation scheme is also used to reduce the storage requirement of binary and stochastic states by several hundreds of times. Then, based on the proposed approach, a 784-2252 RBM design example is developed for MNIST handwritten digit recognition dataset. Analysis shows that the VLSI design of RBM achieves 170 times speedup in training as compared to a CPU-based solution with small performance loss.

Trading Accuracy for Energy in Stochastic Circuit Design

As we approach the end of Moores Law, alternative computing techniques that consume energy more efficiently have been proposed. Stochastic computing (SC) is a re-emerging computing technique that acts on data encoded by bit-streams. It is a low-cost and error-tolerant alternative to conventional binary circuits in some important applications. This paper presents an accuracy-energy tradeoff technique for SC circuits that reduces their energy consumption with virtually no accuracy loss. To this end, we employ voltage/frequency scaling, which normally reduces energy consumption at the cost of timing errors. Then we show that due to their inherent error tolerance, SC circuits operate satisfactorily without significant accuracy loss even with aggressive scaling. Furthermore, we find that logical and physical design techniques can be combined to expand the already powerful accuracy-energy tradeoff possibilities of SC. In particular, we demonstrate that careful adjustment of path delays can lead to error reduction under voltage/frequency scaling. Simulation results show that our optimized SC circuits can tolerate aggressive voltage scaling with no significant SNR degradation after 40% supply voltage reduction (1V to 0.6V), leading to 66% energy saving (20.7pJ to 6.9pJ). We also show that process variation and temperature variation have limited impact on optimized SC circuits.

Source Authentication Techniques for Network-on-Chip Router Configuration Packets

It is known that maliciously configured Network-on-Chip (NoC) routers can enable an attacker to launch different attacks inside a Multiprocessor System-on-Chip (MPSoC). A source authentication mechanism for router configuration packets can prevent such vulnerability. This ensures that a router is configured by the configuration packets sent only by a trusted configuration source. Conventional method like Secure Hash Algorithm-3 (SHA-3) can provide required source authentication in a router but with a router area overhead of 1355.25% compared to a normal router area. We propose eight source authentication mechanisms that can achieve similar level of security as SHA-3 for a router configuration perspective without causing significant area and power increase. Also our proposed techniques require <102 × processing time compared to SHA- 3 implementation. Most of our proposed techniques use different timing channel watermarking methods to transfer source authentication data to the receiver router. We also propose Individual packet based stream authentication (IPSA) technique and combinations of this technique with timing channel watermarking techniques. It is shown that among all of our proposed techniques, maximum router area increment required is 28.32% compared to a normal router for a proposed technique called Distinct interval Reordering based timing channel watermarking (DIRTW).

Asymmetric Underlapped FinFETs for Near- and Super-threshold Logic at Sub-10nm Technology Nodes

Extending double-gate FinFET scaling to sub-10nm technology regime requires device engineering techniques for countering the rise of direct source to drain tunneling (DSDT), edge direct tunneling (EDT) and short channel effects (SCE) that degrade FinFET I-V characteristics. Symmetric underlap is effective for eliminating EDT, diminishing DSDT and lowering the fringe component of gate capacitance. However, excessive symmetric underlap also lowers the on-current which is mainly due to thermionic emission. In this work, it is demonstrated that at sub-10nm node, asymmetric underlapped FinFETs with slightly longer underlap toward drain side than source side are superior to symmetric underlapped FinFETs due to further improvement in Ion/Ioff and reduction in gate-to-drain capacitance. Using quantum mechanical device simulations, FinFETs with various degrees of underlap have been analyzed for improvement in I-V characteristics. A FinFET model for circuit simulations has been constructed that captures the major sub-10nm leakage components, namely, thermionic emission, DSDT, EDT, direct gate oxide tunneling and its associated components. By simulating a 10 stage NAND circuit and a LEON3 processor with interconnect parasitics using these devices, it is shown that asymmetric underlap instead of symmetric underlap in sub-10nm FinFETs can offer lower energy consumption with improved performance for near-threshold logic and higher energy-efficiency for super-threshold logic operation.

Guest Editorial: Special Issue on Nanoelectronic Circuit and System Design Methods for the Mobile Computing Era

Electro-Photonic NoC Designs for Kilocore Systems

The increasing core count in manycore systems requires a corresponding large Network-on-chip (NoC) bandwidth to support the overlying applications. However, it is not possible to provide this large bandwidth in an energy-efficient manner using electrical link technology. To overcome this issue, photonic link technology has been proposed as a replacement. This work explores the limits and opportunities for using photonic links to design the NoC architecture for a future Kilocore system. Three different NoC designs are explored: ElecNoC, an electrical concentrated 2D-mesh NoC; HybNoC, an electrical concentrated 2D-mesh with a photonic multi-crossbar NoC; and PhotoNoC, a photonic multi-bus NoC. We consider both private and shared cache architectures and, to leverage the large bandwidth density of photonic links, we investigate the use of prefetching and aggressive non-blocking caches. Our analysis using contemporary Big Data workloads shows that the non-blocking caches with a shared LLC can best leverage the large bandwidth of the photonic links in the Kilocore system. Moreover, compared to ElecNoC-based and HybNoC-based Kilocore systems, PhotoNoC-based Kilocore system achieves up to 2.5X and 1.5X better performance, respectively, and can support up to 2.1X and 1.1X higher bandwidth, respectively, while dissipating comparable power in the overall system.


Publication Years 2005-2016
Publication Count 307
Citation Count 736
Available for Download 307
Downloads (6 weeks) 2490
Downloads (12 Months) 16692
Downloads (cumulative) 132934
Average downloads per article 433
Average citations per article 2
First Name Last Name Award
Iris Bahar ACM Distinguished Member (2012)
Krishnendu Chakrabarty ACM Distinguished Member (2008)
ACM Senior Member (2006)
Nikil D. Dutt ACM Distinguished Member (2007)
Igor Markov ACM Distinguished Member (2011)
ACM Senior Member (2007)
Dharmendra Modha ACM Gordon Bell Prize
Special Category (2009) ACM Gordon Bell Prize
Special Category (2009)
Saraju P. Mohanty ACM Senior Member (2010)
Trevor Mudge ACM-IEEE CS Eckert-Mauchly Award (2014)
Massoud Pedram ACM Distinguished Member (2008)
Steven K Reinhardt ACM Distinguished Member (2010)

First Name Last Name Paper Counts
Niraj Jha 18
Krishnendu Chakrabarty 10
Wei Zhang 7
Michael Niemier 6
Rodney Van Meter 6
Yuan Xie 6
Li Shang 6
Xiaobosharon Hu 5
Partha Pande 5
Mehdi Tahoori 5
Fabrizio Lombardi 5
Mariagrazia Graziano 4
Shuo Wang 4
Pierre Gaillardon 4
Kaushik Roy 4
Lei Wang 4
Alvin Lebeck 4
Mohammad Tehranipoor 4
Chris Dwyer 4
Morteza Zamani 3
Spyros Tragoudas 3
Mahdi Nikdast 3
Michael Crocker 3
Paul Wettin 3
Jordi Cortadella 3
Sourindra Chaudhuri 3
Jianwei Dai 3
Mehdi Sedighi 3
Aoxiang Tang 3
Yaoyao Ye 3
Bhargab Bhattacharya 3
Tao Xu 3
Fei Su 3
Ferdinand Peper 3
Arun Ravindran 3
Weisheng Zhao 3
Jacques Klein 3
Nagarajan Ranganathan 3
Maurizio Zamboni 3
Robert Wille 3
Eren Kursun 3
Arindam Mukherjee 3
Rolf Drechsler 3
Xiaowen Wu 3
Kishor Trivedi 3
Rivalino Matias 2
Lei Wang 2
Ashok Palaniswamy 2
Jing Huang 2
André DeHon 2
Suman Datta 2
Tsungyi Ho 2
Mehrdad Nourani 2
Min Chen 2
Alexis De Vos 2
Faquir Jain 2
Ruth Bahar 2
Anil Wipat 2
Mahboobeh Houshmand 2
Shashikanth Bobba 2
Pallav Gupta 2
Marco Ottavi 2
Bharat Joshi 2
Josep Carmona 2
Philippe Coussy 2
Xianmin Chen 2
Arighna Deb 2
Baris Taskin 2
Mona Arabzadeh 2
Giovanni De Micheli 2
Jaidev Patwardhan 2
Luca Schiano 2
Yongtae Kim 2
Prateek Mishra 2
Ulf Schlichtmann 2
Amlan Ganguly 2
Eric Rachlin 2
Vijaykrishnan Narayanan 2
Peng Li 2
Hafizur Rahaman 2
Dhiraj Pradhan 2
Torben Mogensen 2
Behrooz Shirazi 2
Xuan Wang 2
Vijay Reddy 2
Sumeet Gupta 2
Mostafizur Rahman 2
Csaba Moritz 2
Himanshu Thapliyal 2
Jiang Xu 2
Frederic Chong 2
Siddhartha Datta 2
Mehdi Saeedi 2
Chiachun Lin 2
Djaafar Chabi 2
Mrigank Sharad 2
Reza Rad 2
Santosh Khasanvis 2
Chris Myers 2
Douglas Densmore 2
Sudip Roy 2
Kolin Paul 2
Stefano Frache 2
Yaojun Zhang 2
Bao Liu 2
Cheng Zhuo 2
Oliver Keszocze 2
Massoud Pedram 2
Jacob Murray 2
Giovanni Micheli 2
Saibal Mukhopadhyay 2
John Savage 2
Byungsoo Choi 2
Damien Querlioz 2
Pinaki Mazumder 1
Jie Zhang 1
Isaac Chuang 1
Kohei Itoh 1
Minlun Chuang 1
Harika Manem 1
Mark Oskin 1
Aravinda Kar 1
Ashwani Sharma 1
Jin He 1
Kevin Chang 1
Xinmin Yu 1
M Balakrishnan 1
Michael Leuchtenburg 1
Pavan Panchapakeshan 1
Csaba Moritz 1
Jan Madsen 1
Sandip Tiwari 1
Weichen Liu 1
Vineet Sahula 1
Yang Du 1
Bertrand Granado 1
Nasim Farahini 1
Ahmed Hemani 1
Ashkan Eghbal 1
Amirali Ghofrani 1
Luke Theogarajan 1
Kyuho Park 1
Hyunchul Seok 1
Chulmin Kim 1
Jun Pang 1
Alex Yakovlev 1
Simon Davidson 1
Steve Furber 1
Steve Temple 1
Nor Haron 1
Said Hamdioui 1
Roberto Natella 1
Roman Lysecky 1
Janet Roveda 1
Qian Wang 1
Juinndar Huang 1
Jungsang Kim 1
Yuan Xue 1
Chengmo Yang 1
Guillaume Prenat 1
Debesh Das 1
Chung Lam 1
Gregory Corrado 1
Roger Cheek 1
Charles Rettner 1
Wengfai Wong 1
Chengkok Koh 1
R Williams 1
Dmytro Apalkov 1
Trongnhan Le 1
Arnaud Carer 1
Wang Kang 1
Penli Huang 1
Hai Li 1
Yiran Chen 1
Vlasia Anagnostopoulou 1
Georgios Varsamopoulos 1
Hengxing Tan 1
Moustafa Mohamed 1
Shinobu Fujita 1
Thomas Lee 1
Stijn De Baerdemacker 1
Muzaffer Simsir 1
Jinho Lee 1
Kyungsu Kang 1
Naser MohammadZadeh 1
Weikai Shih 1
Bryant Wysocki 1
Nathan McDonald 1
Stefan Hillmich 1
Guido Bertoni 1
Stefano Sanfilippo 1
Ruggero Susella 1
Yingjie Lao 1
Navid Asadizanjani 1
Kenneth Ramclam 1
Kevin Scott 1
Jason Cong 1
Shinjiro Toyoda 1
Jie Chen 1
Natalio Krasnogor 1
Jude Rivers 1
Philip Brisk 1
Fuwei Chen 1
Amlan Gangul 1
Saeed Safari 1
Vineeth Vijayakumaran 1
Manoj Yuvaraj 1
Paolo Grani 1
Chunyao Wang 1
William Munro 1
Garrett Rose 1
Makoto Naruse 1
Naokatsu Yamamoto 1
Motoichi Ohtsu 1
Hu Xu 1
Bryan Black 1
Douglas Tougaw 1
Yang Zhao 1
Mircea Stan 1
Timothy Dysart 1
Michael Gladshtein 1
Jeffrey Krichmar 1
Yong Zhang 1
Benoît Miramond 1
Hugues Wouafo 1
Siddharth Gaba 1
Seongmin Kim 1
Basit Sheikh 1
Michael Kishinevsky 1
Delong Shang 1
Claude Cirba 1
Cathy Chancellor 1
Ahmed Louri 1
Rubens Matos 1
F De Souza 1
Lungyen Chen 1
Sparsh Mittal 1
Laurent Becker 1
Kalyan Biswas 1
Jeff Siebert 1
Ningning Wang 1
Brendan O'Flynn 1
Cian O'Mathuna 1
Piero Fariselli 1
Matthew Breitwisch 1
Kailash Gopalakrishnan 1
Niladri Mojumder 1
Xuanyao Fong 1
Jianjia Chen 1
Lothar Thiele 1
Adrian Ong 1
Eugene Chen 1
Bruce Tidor 1
Jie Meng 1
Pritish Narayanan 1
Giacomo Indiveri 1
Can Sitik 1
Emre Salman 1
Suzanne Lesecq 1
Jing Li 1
Muthukumar Murugan 1
Zahra Abbasi 1
Sanjay Ranka 1
Phanisekhar Bv 1
Kevin Fox 1
Christopher Mundy 1
Johnnie Chan 1
Zheng Li 1
Yaowen Chang 1
Robert Glück 1
Indranil Sengupta 1
Tayebeh Bahreini 1
Lu Wang 1
Martin Barke 1
Clare Thiem 1
Milan Patnaik 1
Tinoosh Mohsenin 1
Q Shi 1
Swapnil Bhatia 1
Manojit Dutta 1
Wulong Liu 1
Woohyung Lee 1
Naseef Mansoor 1
Guru Venkataramani 1
Teng Lu 1
Zhehui Wang 1
H Wong 1
Subhasish Mitra 1
James Donald 1
Xiaojun Ma 1
Jeremy Tolbert 1
Tadashi Kawazoe 1
Jiale Huang 1
Mukta Farooq 1
Charles Lieber 1
Adam Cabe 1
Kushal Datta 1
Yehia Massoud 1
Igor Markov 1
Yu Cao 1
Konrad Walus 1
Ajay Bhoj 1
Jorge Kina 1
ChiOn Chui 1
Paul Pop 1
Stanley Yeh 1
Chunyao Wang 1
Zhehui Wang 1
Ingchao Lin 1
Arnab Raha 1
Laura Conde-Canencia 1
Wei Lu 1
Masud Chowdhury 1
Alexander Gotmanov 1
Xuefu Zhang 1
Fei Xia 1
Junchen Liu 1
Gabriela Nicolescu 1
Yuliang Jin 1
Karthik Shankar 1
Moonseok Kim 1
Christophe Layer 1
Saraju Mohanty 1
Taşkın Koçak 1
Lyn Venken 1
Francesco Abate 1
Siva Narendra 1
Sungkyu Lim 1
Alejandro Schrott 1
Mohamad Krounbi 1
Steven Watts 1
Yousuke Takada 1
Shihhsien Kuo 1
Xiang Wei 1
Gianluca Piccinini 1
Dafine Ravelosona 1
Xiaoxia Wu 1
Anish Muttreja 1
Yang Liu 1
Sheyshi Lu 1
Shriram Raghunathan 1
Chandrakant Patel 1
Cullen Bash 1
Susmit Biswas 1
Heba Saadeldeen 1
Ricardo Bianchini 1
Raymond Beausoleil 1
Xi Chen 1
Gaurav Rathi 1
Franjo Ivančić 1
Martin Roetteler 1
Yifang Liu 1
Yang Yi 1
Shankar Balachandran 1
Veezhinathan Kamakoti 1
Junlin Chen 1
Mark Tehranipoor 1
Swaroop Ghosh 1
Yier Jin 1
Amey Kulkarni 1
Yong Zhan 1
Natsuo Nakamura 1
Taeho Kgil 1
Dan Venutolo 1
Erik Lindgren 1
Jennifer Hallinan 1
Harold Fellermann 1
Zhiqiang Li 1
Bibhash Sen 1
Guangyu Sun 1
Masoud Zamani 1
Huazhong Yang 1
Tingting Hwang 1
Anuroop Vidapalapati 1
Matthias Beste 1
Darshan Thaker 1
Kouichi Akahane 1
Daniel Sorin 1
Minhao Zhu 1
Suman Sah 1
Benjamin Belzer 1
Vivek Shende 1
Jonathan Bean 1
Weiguo Tang 1
Okan Palaz 1
S Srinivasan 1
Mohsen Raji 1
Hossein Pedram 1
Shunming Syu 1
Syyen Kuo 1
Rodney Meter 1
Marc Galceran-Oms 1
John Bainbridge 1
Aaron Dingler 1
Stefano Russo 1
Senthil Arasu 1
Fumio Machida 1
Jean Araujo 1
Paulo Maciel 1
William Cane-Wissing 1
Muhammad Ahsan 1
Loic Decloedt 1
Angsuman Sarkar 1
Justin Wenck 1
Rajeevan Amirtharajah 1
Mike Hayes 1
Kathleen Marchal 1
Jos Vanderleyden 1
Damiano Piovesan 1
Enrico Macii 1
Wujie Wen 1
Vladimir Nikitin 1
Daniel Lottis 1
Kiseok Moon 1
Daniel Mange 1
Oana Boncalo 1
H Wong 1
Ayse Coskun 1
Ruiyu Wang 1
Massimo Roch 1
Roger Lake 1
Zhaohao Wang 1
Kwangting Cheng 1
Margot Damaser 1
Martin Arlitt 1
Tao Yang 1
David Du 1
Tridib Mukherjee 1
Hafiz Sheikh 1
Ishfaq Ahmad 1
Landon Sego 1
Manish Vachharajani 1
Mark Cianchetti 1
David Albonesi 1
Chialin Yang 1
Srihari Cadambi 1
Tanay Karnik 1
Milad Maleki 1
Houle Gan 1
Domenic Forte 1
Sina Shahbazmohamadi 1
Marco Indaco 1
Chris Kim 1
Sanjukta Bhanja 1
Nobuaki Miyakawa 1
Z Wang 1
Huaixiu Zheng 1
Curtis Madsen 1
Ashutosh Chakraborty 1
Xuehui Zhang 1
Christopher Curtis 1
Yuchun Ma 1
Haera Chung 1
Kae Nemoto 1
Simeranjit Brar 1
Jiang Xu 1
Steven Rubin 1
Gilda Garretón 1
Sujay Deb 1
Deukhyoun Heo 1
Nabanita Majumdar 1
Nadine Gergel-Hackett 1
Yuxing Yao 1
Ketan Patel 1
Wei Zhao 1
Gabriel Schulhof 1
Prachi Joshi 1
Yungchih Chen 1
Jifeng Chen 1
Dong Xiang 1
Renu Kumawat 1
Amlan Chakrabarti 1
Jiunli Lin 1
Bernard Girau 1
Laurent Rodriguez 1
Hrishikesh Jayakumar 1
Woosuk Lee 1
Zhongqi Li 1
Aldo Romani 1
Nahid Hossain 1
Chinghwa Cheng 1
Victor Nicola 1
Avinash Kodi 1
Mathias Soeken 1
D Miller 1
Bernard Diény 1
Chandan Sarkar 1
Bipul Paul 1
Bryan Jackson 1
Charles Augustine 1
Hai Li 1
Andy Tyrrell 1
Andrew Greensted 1
Joël Rossier 1
Alexander Khitun 1
Mircea Vlăduţiu 1
Lukáš Sekanina 1
Alain Pegatoquet 1
Olivier Berder 1
K Habib 1
Saber Moradi 1
Daniel Fasnacht 1
Leo Filippini 1
Huaiyuan Tseng 1
Yujie Huang 1
Yaohong Wang 1
Krishna Kant 1
Giacomo Ghidini 1
Andrew Rawson 1
Tahir Cader 1
William Gustafson 1
Aleksandr Biberman 1
Qianfan Xu 1
Alan Mickelson 1
Bipul Paul 1
Masaki Okajima 1
Pinghung Yuh 1
Eva Rotenberg 1
Ismo Hänninen 1
Craig Lent 1
Ryangary Kim 1
Niraj Jha 1
Alaeddin Aydiner 1
Chenyuan Zhao 1
Chidhambaranathan R 1
Chirag Garg 1
Arnab Roy 1
Gerardo Pelosi 1
Yu Bi 1
Jiannshiun Yuan 1
Cesare Ferri 1
Sherief Reda 1
Steven Reinhardt 1
Jonathan Salkind 1
Qiaoyan Yu 1
Chris Winstead 1
Ernst Oberortner 1
Tara Deans 1
Fatima Hadjam 1
Hanwu Chen 1
Joseph Horton 1
Andrew Ferraiuolo 1
Hanieh Mirzaei 1
Bo Yuan 1
Bin Li 1
Mehdi Kamal 1
Andres Kwasinski 1
Carlotta Guiducci 1
Naoya Tate 1
Leyla Nazhandali 1
Shengqi Yang 1
Robert Hannon 1
Jamil Wakil 1
Yue Wu 1
Daniel Davids 1
Aditya Prasad 1
Jun Zeng 1
Mariam Momenzadeh 1
John Hayes 1
Graham Jullien 1
Rajeswari Devadoss 1
Elena Maftei 1
Jiale Liang 1
S Wong 1
Kele Shen 1
Sarmishtha Ghoshal 1
Jing Xie 1
Nikil Dutt 1
Benoit Chappet De Vangel 1
César Torres-Huitzil 1
Pooria Yaghini 1
Nader Bagherzadeh 1
Cyrille Chavet 1
Chiahung Chien 1
Marco Tartagni 1
Dongjae Shin 1
Minkyu Maeng 1
Luis Plana 1
David Clark 1
Jim Garside 1
Eustace Painkras 1
Evan Lent 1
Marc Jaekel 1
Domenico Cotroneo 1
Jin Sun 1
Vandi Alves 1
Sumeet Gupta 1
Yihang Chen 1
Anja Von Beuningen 1
Luca Ramini 1
Virgile Javerliac 1
Kotb Jabeur 1
Stephane Gros 1
Pierre Paoli 1
Keqin Li 1
Chengwen Wu 1
Jamie Collier 1
Rita Casadio 1
Ramprasad Ravichandran 1
Bulent Kurdi 1
Dharmendra Modha 1
Geoffrey Burr 1
Sri Choday 1
Yiran Chen 1
Jun Yang 1
Clemens Moser 1
Alexey Khvalkovskiy 1
Mary Eshaghian-Wilner 1
Lucian Prodan 1
Mihai Udrescu 1
Jacob White 1
Gordon Wan 1
Olivier Sentieys 1
Mostafa Azghadi 1
Mehmet Ozdas 1
Edith Beigné 1
Tsungching Huang 1
Paul Falkenstern 1
Mohamad Sawan 1
Dang Nguyen 1
Steve Majerus 1
Aditya Bansal 1
Zhenyu Sun 1
Amip Shah 1
Radu Marculescu 1
Sajal Das 1
Michal Lipson 1
Keren Bergman 1
Hongyu Zhou 1
Holger Axelsen 1
Greg Snider 1
Trung Nguyen 1
Swarup Bhunia 1
El Hasaneen 1
Kiyoung Choi 1
William Fornaciari 1
Vivek De 1
Elena Vatajelu 1
Anirudh Iyengar 1
Kaveh Shamsi 1
Xunzhao Yin 1
Jayita Das 1
Glenn Reinman 1
Shigeto Nakayama 1
Trevor Mudge 1
Nathan Binkert 1
David Wolpert 1
Paul Ampadu 1
Nicholas Roehner 1
Rudolf Füchslin 1
Herbert Sauro 1
Goksel Misirli 1
Biplab Sikdar 1
Mark Hagan 1
Daniel Grissom 1
Nishad Nerurkar 1
Sandro Bartolini 1
Jie Chen 1
Christine Nardini 1
Pratik Kabali 1
Weichen Liu 1
Yao Xu 1
Ransford Hyman 1
Narayanan Komerath 1
Fiona Teshome 1
Yang Liu 1
Vasilis Pavlidis 1
Debasis Mitra 1
Gabriel Loh 1
Lloyd Harriott 1
Arthur Nieuwoudt 1
Sezer Gören 1
Behnam Ghavami 1
Luke Pierce 1
Manoj Gaur 1
Ozan Ozbag 1
Oluleye Olorode 1
Juha Plosila 1
Hannu Tenhunen 1
Nilanjan Goswami 1
Rangharajan Venkatesan 1
Anand Raghunathan 1
Michele Dini 1
Woomin Hwang 1
Jeffrey Pepper 1
Ian O'Connor 1
M Amadou 1
Marco Vacca 1
Philippe Matherat 1
John Jr 1
Haldun Kufluoglu 1
Xueqing Li 1
Xun Gao 1
Davide Bertozzi 1
Hoda Khouzani 1
Fabrice Bernard-Granger 1
Gregory Pendina 1
Giuseppe Profiti 1
Pier Martelli 1
Andrea Acquaviva 1
Arijit Raychowdhury 1
Alvaro Padilla 1
Simone Raoux 1
Satish Kumar 1
Sayeef Salahuddin 1
Xueti Tang 1
Xiao Luo 1
Shiva Navab 1
Kang Wang 1
Albert Lin 1
Azzurra Pulimeno 1
Gefei Wang 1
Youguang Zhang 1
Chunyi Lee 1
Muhammad Salam 1
Michael Suster 1
Paul Fletter 1
Hsinhung Liao 1
Tao Wang 1
Jia Lee 1
Wen Ko 1
Pedro Irazoqui 1
Xiang Chen 1
Prithviraj Banerjee 1
Alan Savage 1
Siddharth Garg 1
Diana Marculescu 1
Andrès Márquez 1
Jacob Levy 1
Michael Thomsen 1
Kamalika Datta 1
Umamaheswara Tida 1
Pragyan Mohanty 1
Yiyu Shi 1
V Devanathan 1
Alessandro Barenghi 1
Shahed Quadir 1
John Chandy 1
Mario Barbareschi 1
Paolo Prinetto 1
Jaewon Jang 1
Chengwei Lin 1
Giovanni De Micheli 1
Yuchun Ma 1
Yongxiang Liu 1
Sachin Sapatnekar 1
Yutaka Sacho 1
Eiri Hashimoto 1
Ali Saidi 1
Andy Chiu 1
Haiyao Huang 1
Claudio Moraga 1
Marek Perkowski, 1
Xiaoyu Song 1
Md Rahman 1
Gerhard Dueck 1
Samik Some 1
Olivier Thomas 1
Yu Cao 1
Yu Wang 1
Howie Huang 1
Tzvetan Metodi 1
Andrew Cross 1
Jeyavijayan Rajendran 1
Michael Henry 1
Ashok Srivastava 1
Venkataraman Mahalingam 1
Xuemei Chen 1
Clay Mayberry 1
Benjamin Gojman 1
Krishnendu Chakrabarty 1
Kerry Bernstein 1
James Tour 1
Garrett Rose 1
H Ugurdag 1
Meng Zhang 1
Kenichi Morita 1
V Kamakoti 1
Soumya Eachempati 1
Peter Kogge 1
Zahra Sasanian 1
H Wong 1
A Bhattacharya 1
Yuan Xie 1
Jaeyoon Kim 1
Spyros Tragoudas 1
Niraj Jha 1
Pohsun Wu 1
Syed Jafri 1
Misagh Khayambashi 1
Vijay Raghunathan 1
Jue Wang 1
Tao Li 1
Miguel Lastras-Montano 1
Melika Payvand 1
Kwangting Cheng 1
Matteo Filippi 1
Dongjin Kim 1
Rajit Manohar 1
Fabien Clermidy 1
Roberto Pietrantuono 1
Jing Zhao 1
Yanbin Wang 1
Gautam Kapila 1
Jack Sampson 1
Abbas Dehghani 1
Kamal Jamshidi 1
Jianyu Chen 1
Sylvain Claireux 1
Guangyan Zhang 1
Wensi Wang 1
Terence O'Donnell 1
Elisa Ficarra 1
Rohit Shenoy 1
Bipin Rajendran 1
Subho Chatterjee 1
Alexander Driskill-Smith 1
André Stauffer 1
Pierre Mudry 1
Gianluca Tempesti 1
Marya Lieberman 1
Jie Deng 1
Tiansheng Zhang 1
Yue Zhang 1
Claude Chappert 1
Sungjun Yoon 1
Chenpang Kung 1
Steven Garverick 1
Yaojoe Yang 1
Swaroop Ghosh 1
Diana Franklin 1
Sandeep Gupta 1
Kyle Preston 1
Gilbert Hendry 1
Nicolas Sherwood-Droz 1
William Hwang 1
Stéphane Burignat 1
Tetsuo Yokoyama 1
Alireza Shafaei 1
Rajat Chakraborty 1
Davide Zoni 1
Andrew Kahng 1
Luca Breveglieri 1
Qianying Tang 1
Keshab Parhi 1
Giorgio Natale 1
Lionel Torres 1
Youngok Pino 1
Matthew French 1
Takanori Maebashi 1
Krisztiàn Flautner 1
Dennis Huo 1
Zhen Zhang 1
Maik Hadorn 1
Perrine Batude 1
Shengqi Yang 1
Wenping Wang 1
Wei Zhang 1
Christof Teuscher 1
Ali Afzali-kusha 1

Affiliation Paper Counts
Universite Pierre et Marie Curie 1
Polytechnic University - Brooklyn 1
University of Victoria 1
Feng Chia University 1
Southeast University China, Nanjing 1
Google Inc. 1
Yangzhou University 1
University of Kansas 1
Centre Hospitalier de L'Universite de Montreal 1
The University of British Columbia 1
Samsung Group 1
State University of New York at New Paltz 1
Ohio University Athens 1
Nanzan University 1
University of Texas at Austin 1
Brno University of Technology 1
University of Waterloo 1
University of Maryland, Baltimore 1
Peking University 1
University of Washington 1
Defence Research and Development Organisation India 1
University of Washington Seattle 1
Japan Science and Technology Agency 1
Zurich University of Applied Sciences Winterthur 1
Yeditepe University 1
National University of Singapore 1
George Mason University 1
Advanced Micro Devices, Inc. 1
Harbin Institute of Technology 1
Chang Gung University 1
Hewlett-Packard 1
University of Twente 1
Chongqing University 1
Oak Ridge National Laboratory 1
University of North Texas 1
University of California, Berkeley 1
Valparaiso University 1
University of Oxford 1
Federal University of Piaui 1
Cadence Design Systems 1
Research Organization of Information and Systems National Institute of Informatics 1
Wuhan University 1
Chung Yuan Christian University 1
University of California, San Diego 1
Rutgers, The State University of New Jersey 1
Hiroshima University 1
University of Copenhagen 1
University of California System 1
NEC Corporation 1
Utah State University 1
Universite Nice Sophia Antipolis 1
Texas Instruments (India) Ltd 1
National Taiwan University Hospital 1, Inc. 1
Ozyegin University 1
ORT Braude - College of Engineering 1
ARM Ltd. 1
Kalyani Government Engineering College 1
MCKV Institute of Engineering 1
University of Calgary 2
University of Siena 2
Harbin Engineering University 2
University of Minnesota System 2
Delft University of Technology 2
University of Turku 2
University of Missouri-Kansas City 2
Commissariat a L'Energie Atomique CEA 2
Laboratoire d'Informatique, de Robotique et de Microelectronique de Montpellier LIRMM 2
Harvard University 2
Qualcomm Incorporated 2
Seoul National University 2
Federal University of Uberlandia 2
University of Southern California, Information Sciences Institute 2
Kirtland Air Force Base 2
Shahed University 2
Louis Stokes Cleveland VA Medical Center 2
University of Science and Technology of China 2
University of Bristol 2
Jadavpur University 2
Bahcesehir University 2
Johannes Kepler University Linz 2
STMicroelectronics 2
Southern Illinois University 2
University of Ferrara 2
Missouri University of Science and Technology 2
Daneshgahe Esfahan 2
Stony Brook University 2
Virginia Tech 2
University of Seoul 2
Oracle Corporation 2
Carnegie Mellon University 2
University of Alberta 2
California Institute of Technology 2
Hefei National Laboratory for Physical Sciences at Microscale 2
Universite de Lyon 2
Toshiba America Research, Inc 2
European Centre for Soft Computing 2
Universite de Lorrain 2
Indian Institute of Technology 2
George Washington University 3
Indian Institute of Technology, Kharagpur 3
Louisiana State University 3
Technical University of Denmark 3
National Chiao Tung University Taiwan 3
Beihang University 3
University of York 3
University of New Brunswick 3
Malaviya National Institute of Technology 3
NEC Laboratories America, Inc. 3
Air Force Research Laboratory 3
University of Tehran 3
University of Delaware 3
Indian Statistical Institute, Kolkata 3
University of Maryland, Baltimore County 3
Catholic University of Leuven 3
Indian Institute of Technology, Delhi 3
Polytechnic University of Timisoara 4
Tyndall National Institute at National University of Ireland, Cork 4
Royal Institute of Technology 4
University of Texas at Arlington 4
Universite de Bretagne-Sud 4
Nanyang Technological University 4
University of Texas at Dallas 4
University of Arizona 4
Polytechnic School of Montreal 4
Technical University of Munich 4
Shanghai University 4
Portland State University 4
University of Tokyo 4
University of Rochester 4
University of Southern California 4
Columbia University 4
Universite de Rennes 1 4
Federal University of Pernambuco 4
Ghent University 4
Karlsruhe Institute of Technology 4
National Institute of Technology, Durgapur 4
Rice University 5
Universitat Politecnica de Catalunya 5
University of Florida 5
University of Naples Federico II 5
National Tsing Hua University 5
Texas A and M University 5
University of Central Florida 5
Massachusetts Institute of Technology 5
Pacific Northwest National Laboratory 5
University of Utah 5
University of Minnesota Twin Cities 5
University of California, Riverside 5
Japan National Institute of Information and Communications Technology 5
Case Western Reserve University 6
University of Texas at San Antonio 6
University of California, Irvine 6
University of California, Davis 6
National Cheng Kung University 6
University of Pittsburgh 6
New York University 6
University of Virginia 6
Politecnico di Milano 6
Universite Paris-Sud XI 6
Southern Illinois University at Carbondale 6
Arizona State University 7
Newcastle University, United Kingdom 7
Drexel University 7
Indian Institute of Technology, Madras 7
University of Manchester 8
Keio University 8
Korea Advanced Institute of Science & Technology 8
The Institute of Fundamental Electronics, Orsay 8
Rochester Institute of Technology 9
IBM Almaden Research Center 9
HP Labs 9
Brown University 9
Tsinghua University 9
Swiss Federal Institute of Technology, Zurich 9
University of California, Los Angeles 9
National Taiwan University 10
Cornell University 10
Texas Instruments 10
IBM Thomas J. Watson Research Center 10
Boston University 10
University Michigan Ann Arbor 10
Bremen University 11
University of Massachusetts Amherst 11
University of Bologna 11
University of Colorado at Boulder 12
The University of North Carolina at Charlotte 12
Stanford University 13
Georgia Institute of Technology 13
Swiss Federal Institute of Technology, Lausanne 13
University of South Florida Tampa 14
Amirkabir University of Technology 15
Northeastern University 16
University of California, Santa Barbara 16
Intel Corporation 17
Hong Kong University of Science and Technology 18
Pennsylvania State University 19
Washington State University 20
Polytechnic Institute of Turin 21
Purdue University 23
University of Notre Dame 23
University of Connecticut 30
Duke University 32
Princeton University 43

ACM Journal on Emerging Technologies in Computing Systems (JETC) - Regular Papers

Volume 12 Issue 4, July 2016 Regular Papers
Volume 13 Issue 1, June 2016  Issue-in-Progress

Volume 12 Issue 3, September 2015 Special Issue on Cross-Layer System Design and Regular Papers
Volume 12 Issue 2, August 2015 Special Issue on Advances in Design of Ultra-Low Power Circuits and Systems in Emerging Technologies
Volume 12 Issue 1, July 2015
Volume 11 Issue 4, April 2015 Special Issues on Neuromorphic Computing and Emerging Many-Core Systems for Exascale Computing

Volume 11 Issue 3, December 2014 Special Issue on Computational Synthetic Biology and Regular Papers
Volume 11 Issue 2, November 2014 Special Issue on Reversible Computation and Regular Papers
Volume 11 Issue 1, September 2014
Volume 10 Issue 4, May 2014
Volume 10 Issue 3, April 2014
Volume 10 Issue 2, February 2014
Volume 10 Issue 1, January 2014 Special Issue on Reliability and Device Degradation in Emerging Technologies and Special Issue on WoSAR 2011

Volume 9 Issue 4, November 2013 Special Issue on Bioinformatics
Volume 9 Issue 3, September 2013
Volume 9 Issue 2, May 2013 Special issue on memory technologies
Volume 9 Issue 1, February 2013

Volume 8 Issue 4, October 2012
Volume 8 Issue 3, August 2012
Volume 8 Issue 2, June 2012 Special Issue on Implantable Electronics
Volume 8 Issue 1, February 2012

Volume 7 Issue 4, December 2011
Volume 7 Issue 3, August 2011
Volume 7 Issue 2, June 2011
Volume 7 Issue 1, January 2011

Volume 6 Issue 4, December 2010
Volume 6 Issue 3, August 2010
Volume 6 Issue 2, June 2010
Volume 6 Issue 1, March 2010

Volume 5 Issue 4, November 2009
Volume 5 Issue 3, August 2009
Volume 5 Issue 2, July 2009
Volume 5 Issue 1, January 2009

Volume 4 Issue 4, October 2008
Volume 4 Issue 3, August 2008
Volume 4 Issue 2, April 2008
Volume 4 Issue 1, March 2008
Volume 3 Issue 4, January 2008

Volume 3 Issue 3, November 2007
Volume 3 Issue 2, July 2007
Volume 3 Issue 1, April 2007

Volume 2 Issue 4, October 2006
Volume 2 Issue 3, July 2006
Volume 2 Issue 2, April 2006
Volume 2 Issue 1, January 2006

Volume 1 Issue 3, October 2005
Volume 1 Issue 2, July 2005
Volume 1 Issue 1, April 2005
All ACM Journals | See Full Journal Index