Application and Thermal Reliability-Aware Reinforcement Learning Based Multi-Core Power Management
Quantum information processing and communication techniques rely heavily upon entangled quantum states, motivating the development of methods and systems to generate entanglement. Much research has been dedicated to entangling radix-2 qubits, resulting in the Bell state generator and its generalized forms where the number of entangled qubits is greater than two, but up until this point, higher-radix quantum entanglement has been largely overlooked. In this work, techniques for quantum state entanglement in high-dimensional systems are described. These higher dimensioned quantum informatic systems comprise n quantum digits, or qudits, that are mathematically characterized as elements of a r-dimensioned Hilbert vector space where r>2. Consequently, the wavefunction is a time-dependent state vector of dimension rn. Theoretical analyses and specific higher-radix entanglement generators are discussed.
Massive multi-threading in GPU imposes tremendous pressure on memory subsystems. Due to slowly improved peak memory bandwidth, memory becomes a bottleneck of performance and energy efficiency in GPU. In this work, we propose an integrated architectural scheme to optimize the memory accesses and therefore boost the performance and energy efficiency of GPU. Firstly, we propose a thread batch enabled memory partitioning (TEMP) to improve memory access parallelism. In particular, TEMP groups multiple thread blocks that share the same set of pages into a thread batch and bounds stream multiprocessor (SM) to dedicated memory banks. After that, TEMP dispatches the thread batch to an SM to ensure high-parallel memory-access streaming. Secondly, a thread batch-aware scheduling (TBAS) scheme is introduced to improve memory access locality and reduce the contention on memory controllers and interconnection networks. Experimental results show that the integrated TEMP and TBAS can achieve up to 10.3% performance improvement and 11.3% DRAM energy reduction across diverse GPU applications. We also evaluate the performance interference induced by CPU applications in the GPU+CPU heterogeneous system with our proposed schemes. Our results show that the proposed solution can ensure the execution efficiency of GPU applications with negligible performance degradation of CPU applications.
Spiking neural networks (SNNs) are artificial neural network models that more closely mimic biological neural networks. In addition to neuronal and synaptic state, SNNs incorporate the variant time scale into their computational model. Since each neuron in these networks is connected to thousands of others, high bandwidth is required. Moreover, since the spike times are used to encode information in SNN, very low communication latency is also required. The 2D-NoC was used as a solution to provide a scalable interconnection fabric in large-scale parallel SNN systems. The 3D-ICs have also attracted a lot of attention as a potential solution to resolve the interconnect bottleneck. The combination of these two immerging technologies provides a new horizon for IC designs to satisfy the high requirements of low-power and small foot-print in emerging AI applications. In this work, we first present an efficient mathematical model to analyze the performance of different neural network topologies. Second, we present an architecture and two low-latency spike routing algorithms, named K-means based multicast routing (KMCR) and shortest path K-means based multicast (SP-KMCR), for three-dimensional NoC of spiking neurons (3DNoC-SNN). The proposed system was validated based on an RTL-level implementation, whilst area/power analysis is performed using 45-nm CMOS technology.
Recent advancement of microelectrode-dot-array (MEDA) based architecture for digital microfluidic biochips has enabled a major enhancement in microfluidic operations for traditional lab-on-chip devices. One critical issue for MEDA based biochips is the transportation of droplets. MEDA allows dynamic routing for droplets of different size. In this paper, we propose a high-performance droplet routing technique for MEDA based digital microfluidic biochips. First, we propose the basic concept of droplet movement strategy in MEDA based design together with a definition of strictly shielded zones within the layout in MEDA architecture. Next, we propose transportation schemes of droplets for MEDA architecture under different blockage or crossover conditions and estimate route distances for each net in offline. Finally, a priority based routing strategy combining various transportation schemes stated earlier has been proposed. Concurrent movement of each droplet is scheduled in a time-multiplexed manner. This poses critical challenges for parallel routing of individual droplets with optimal sharing of cells formulating a routing problem with higher complexity. The final compaction solution satisfies the timing constraint and improves fault tolerance. Simulations are carried out on standard benchmark circuits namely Benchmark suite I and Benchmark suite III. Experimental results show satisfactory improvements and prove a high degree of robustness for our proposed algorithm.
Racetrack memory (RM), a new storage scheme in which information flows along a nanotrack, has been considered as a potential candidate for future high-density storage device instead of hard disk drive (HDD). The first RM technology, proposed in 2008 by IBM, relies on a train of opposite magnetic domains separated by domain walls (DWs), named DW-RM. After ten years of intensive research, a variety of fundamental advancements has been achieved, unfortunately, no product is available until now. On the other hand, new concepts might also be on the horizon. Recently, an alternative information carrier, magnetic skyrmion, experimentally discovered in 2009, has been regarded as a promising replacement of DW for RM, named skyrmion-based RM (SK-RM). Amazing advances have been made in observing, writing, manipulating and deleting individual skyrmions. So, what are the relationship between DW and skyrmion? What are the key differences between DW and skyrmion, or between DW-RM and SK-RM? What benefits could SK-RM bring and what challenges need to be addressed before application? In this paper, we intend to answer these questions through a comparative cross-layer study between DW-RM and SK-RM. This work will provide guidelines, especially, for circuit and architecture researchers on RM.
Brain-inspired hyperdimensional (HD) computing models neural activity patterns of the very size of the brain?s circuits with points of a hyperdimensional space, that is, with hypervectors (i.e., ultrawide holographic words: D=10, 000 bits). At its very core, HD computing manipulates a set of seed hypervectors to build composite hypervectors representing objects of interest. It demands memory optimizations with simple operations for an efficient hardware realization. We propose hardware techniques for optimizations of HD computing, in a synthesizable VHDL library, to enable co-located implementation of both learning and classification tasks on only a small portion of Xilinx FPGAs. Our Pareto optimal design is mapped on only 18340 CLBs of an FPGA achieving simultaneous 2.39× lower area and 986× higher throughput compared to the baseline. This is accomplished by: (1) rematerializing hypervectors on the fly by substituting the cheap logical operations for the expensive memory accesses to seed hypervectors; (2) online and incremental learning from different gesture examples while staying in the binary space; (3) combinational associative memories to steadily reduce the latency of classification.