Looking Beyond CMOS Technology for Future HPC
April 5, 2016
7:00-8:00 AM: Breakfast and Registration
8:00-8:15 AM: Welcome and Introductions - Neena Imam
The United States has a new HPC roadmap that includes the National Strategic Computing Initiative to build within the next decade exaflop (1018 operations per second) supercomputers that can analyze up to one exabyte (1018 bytes) of data. Exascale computing will address key questions of national significance and help maintain U.S. competitiveness. This talk will discuss the current state of supercomputing, the exascale requirements, and the possible ways of achieving this crucial milestone.
Manufacturers are reaching a new phase in CMOS evolution using advanced device architectures and new fabrication processes. We may be able to reach exascale by leveraging the new generation of CMOS logic. Other necessary boosts in our near-term computing power will come from systems engineering via multi-core/many-core/accelerator architectures, 3D integration and packaging for deeper memory/storage hierarchy, and photonics integration to enable faster data transport.
As we prepare for exascale, we must also establish new hardware technology and related ecosystem for future HPC, even after the limits of Moore's Law is reached due to limitations imposed by fundamental physics. Although Moore's Law is slowing down, the end of Moore's Law will spur innovation and lead us to new architectures that will help us solve problems that are not solvable today. We discuss some of these exciting new possibilities.
Session 1: Nanomaterials for Future HPC
Since we are rapidly approaching the minimum effective size of CMOS circuitry, nano-electronics might hold the answers to increasing the device capabilities at reduced footprint and power. In this session, we will discuss disruptive technologies such as Carbon nanotube based transistors, HPC applications of Carbon based nanomaterials such as Graphene, etc. We will also discuss the technical challenges of maintaining the advantageous properties of nanomaterials at very large scale.
Session Chair: Dr. Barney Maccabe, Computing and Computational Sciences Directorate, Oak Ridge National Laboratory
9:15-9:50 AM: Dr. Victor Zhirnov, Semiconductor Research Corporation, Future Information Processing: From Materials to Systems
Abstract: It is becoming increasingly clear that in future information processing applications, synergistic innovations in materials, devices, and system architectures will be a key to achieving new levels of performance. We will examine the physics of extreme scaling of information processing devices and systems, with a focus on energy minimization and also discuss the materials and architectural implications on system scaling. The connection of the device physics in the Boltzmann-Heisenberg limits and the parameters of the digital circuits implemented from these devices will be explored. An abstraction of a Minimal Turing Machine built from the limiting devices will be used to provide insight on the "intelligence" that could be expected from a volume of matter.
9:50-10:25 AM: Dr. Olga S. Ovchinnikova, Center for Nanophase Material Sciences,
Oak Ridge National Laboratory, Directing Matter: Towards Atomic Scale 3D Nanofabrication
Abstract: To move beyond current technologies, design of both silicon-based and functional materials requires material control on the level of individual atomic configurations and single atom positions. This is also important for understanding and harnessing energy where disruptive improvements in efficiencies are required. Achieving this ultimate limit in science and technology of material and device performance–from quantum computing to nonvolatile memories, from thermoelectrics to superconductors, requires design and control of matter with atomic, molecular, and mesoscale fidelity by developing precision synthesis tools. Here we will discuss development of pathways to direct matter in a scalable fashion to fabricate/design three-dimensional structures of a variety of materials with atom-by-atom and defect control of their shape and composition.
10:25-11:00 AM: Mr. Max Shulaker, Stanford University, Transforming Nanodevices into Nanosystems: the N3XT 1,000X
Abstract: Future computing demands far exceed the capabilities of today's electronics, and cannot be met by isolated improvements in transistor technologies, memories, or integrated circuit (IC) architectures alone. The N3XT (Nano-Engineered Computing Systems Technology) approach overcomes these challenges through recent advances across the computing stack: (a) transistors using nanomaterials such as one-dimensional carbon nanotubes (and two-dimensional semiconductors) for high performance and energy efficiency, (b) high-density non-volatile resistive and magnetic memories, (c) ultra-dense (e.g., monolithic) three-dimensional integration of logic and memory for fine-grained connectivity, (d) new architectures for computation immersed in memory, and (e) new materials technologies and their integration for efficient heat removal. N3XT hardware prototypes represent leading examples of transforming scientifically-interesting nanomaterials and nanodevices into actual nanosystems. Compared to conventional approaches, N3XT architectures promise to improve energy efficiency significantly, in the range of three orders of magnitude, thereby enabling new frontiers of high performance computing.
11:00-11:15 AM: Short Break
Abstract: The 3D Tri-gate or FinFET technology market is expected to grow from $5 billion USD in 2015 to $35 billion by 2022, spanning across several technology nodes from 22nm, 14nm, 10nm to 7nm and a range of product offerings from CPUs, SoCs, FPGAs, GPUs, MCUs to Network Processors. 3D Tri-gate transistor enjoys fundamental advantages over its planar transistor counterpart with respect to superior performance at lower supply voltages, significantly higher drive current per layout width and improved intrinsic threshold voltage variation. In this talk, I will cover the basic operation principles of Tri-gate transistors from device physics and circuit design perspective. I will also present opportunities and challenges of advanced band engineered FinFET concepts such as Germanium and Compound Semiconductor based Quantum-Well FinFETs and stacked gate-all-around (GAA) nanowire FETs that are being actively pursued to extend technology scaling beyond 5nm node for HPC applications for the next decade.
11:50 AM-12:25 PM: Dr. Ali Adibi, Department of Electrical and Computer Engineering, Georgia Tech, Integrated Nanophotonic Structures for Optical Computing
Abstract: The use of rich nonlinear dynamics in photonic resonators has emerged as a potential means for analog computing. In this talk, a review of the recent advance in optical computing will be presented, and the unique features of nanophotonic structures for forming analog computing modules in an integrated chip-scale platform will be presented. In addition, the recent achievements in the development of ultra-compact and high-performance resonance-based reconfigurable photonic devices and subsystems (as enabling tools for optical computing) will be demonstrated.
12:25-1:30 PM: Lunch and Discussion of Morning Session
Session 2: Quantum Computing
Quantum computing paradigms challenge our traditional understanding of nature by utilizing quantum phenomenon called superposition of states. In theory, quantum computing can provide parallel processing capabilities that cannot be matched by traditional computing logic. However, there are many technical challenges for quantum computing that must be addressed: such as the extraction of information from the system, materials for physically implementing a quantum computer, and lack of software stack for programming quantum processors. In this session, we will discuss the programming environment, quantum compilers, benchmarks/metrics, and materials research for quantum computing.
Session Chair: Dr. Neena Imam, Computing and Computational Sciences Directorate, Oak Ridge National Laboratory
1:30-2:10 PM: Dr. David Dean, Physical Sciences Directorate, Oak Ridge National Laboratory, Quantum Computing Materials and Interfaces
Abstract: This talk will describe ORNL's Laboratory Directed Research and Development Initiative on Quantum Computing Materials and Interfaces. This initiative, which began with the selection of four projects that started in October 2015, will integrate core competencies in materials, modeling, and isotopes to establish a broad R&D effort in quantum computing. The initiative will create an S&T base at the Laboratory to drive computing beyond the exascale. Our strategy is to develop tools necessary to characterize and design high-fidelity physical qubits, to explore methods to interface qubits to traditional computers, to develop a multi-qubit research test bed, to research methods to program multi-qubit systems, and to foster multiagency ties to perform such research in the longer term.
2:10-2:50 PM: Mr. Steve Reinhardt, D-WAVE Systems, Co-Evolving Quantum Architecture with Early Application Feedback
Abstract: Moore's Law advances in classical computing have slowed recently, creating an opportunity for other types of computing to emerge. Quantum computing is one such candidate, with many implementations being explored. The D-Wave 2X™ quantum annealer, currently with a thousand qubits, is the only commercially available quantum computer; Google has reported differentiated performance results for important common problems. Starting from a description of today's D-Wave hardware and software architecture, we describe some possible evolutionary paths to greater generality. The actual path, and date of widespread usefulness, will depend heavily on the feedback D-Wave receives from early application developers and users over the next few years.
2:50-3:20 PM: Afternoon Break and Refreshments
3:20-4:00 PM: Dr. Mark Novotny, Department Head, Physics and Astronomy Department, Mississippi State University, Quantum Computers: Testing and Selected Applications
Abstract: Adiabatic Quantum Computing (AQC) holds the potential to enable solutions to problems that are not in the complexity class P in computational complexity theory. Availability of D-Wave machines with more than 1000 qubits provides a new tool in computational studies. We have tested the D-Wave machines to better understand their limitations, applications, and how the next generation may be enhanced. We describe our tests of the D-Wave, which include random spanning tree studies, planted solutions, and an answer checking procedure. Our main application is to Boltzmann machines for machine learning. Our enhancements for the next generation include an answer checking paradigm and small-world connections added to the D-Wave Chimera lattice.
Abstract: Quantum computing promises new opportunities for solving hard computational problems, but harnessing this novelty will require breakthrough concepts in the design, operation, and application of computing systems. In this talk, we define some of the challenges facing the development of quantum computing systems as well as software-based approaches that can be used to overcome these challenges. Following a brief overview of the state of the art, we present recent advances in the modeling and simulation of quantum computing systems, the development of architectures for hybrid high-performance computing systems, and the realization of software stacks for quantum computing. This leads to a discussion of the role that conventional computing plays in the quantum paradigm and how some of the current challenges for exascale computing overlap with those facing quantum computing.
4:40-5:20 PM: Dr. David C. McKay, IBM T.J. Watson Research Center, Universal Quantum Computing at IBM: Successes and Challenges
Abstract: A universal quantum computer – a computer that harnesses the power of quantum entanglement – promises to revolutionize our ability to solve certain problems, such as factoring, search and molecular design. However, quantum states are fragile, and so constructing a robust quantum computing device is a formidable challenge. At IBM we are pursuing a bottom-up approach by first demonstrating high fidelity gates between a small number of quantum bits ("qubits") as we move towards producing logical qubits that are protected against errors by continuous "quantum parity" measurement. Most recently we demonstrated the ability to detect arbitrary errors using a four qubit configuration [Corcoles et al. Nat. Comm. 6, 6979 (2015)]. In this talk, I will discuss our current progress towards a logical qubit and on novel designs for high fidelity qubit interactions. I will also discuss the challenges that we must address before we can realize a fully universal quantum computer and the applications for first-generation systems with order of 100 qubits.
5:20 PM: Adjourn
April 6, 2016
7:30-8:30 AM: Breakfast and Registration
8:30-9:30 AM: Keynote Speech 2: Dr. Eng Lim Goh, Chief Technology Officer and Senior Vice President - SGI: Challenges Awaiting Solutions Beyond CMOS.
Abstract: In addition to classes of superpolynomial problems, there are also challenges with current compute-, data-intensive and aspiring artificially-intelligent systems that may only have practical solutions Beyond CMOS. We discuss their examples.
Session 3: Superconducting Computing
Superconducting computing may be an attractive alternative to CMOS circuits with many potential advantages such as ultrafast switching time and very low energy consumption. This session will discuss the recent investments in superconducting computing by federal agencies, the state of the art in superconducting circuits, the very difficult task of suitable cryogenic memory design, and fabrication challenges.
Session Chair: Mr. John Bolger, United States Department of Defense
9:30-10:15 AM: Dr. Scott Holmes, IARPA, Superconducting Computing and the IARPA C3 Program
Abstract: Superconducting computing is a potential solution for energy-efficient, large-scale computing. The Intelligence Advanced Research Projects Activity (IARPA) Cryogenic Computing Complexity (C3) program was established to demonstrate a complete superconducting computer including processing units and cryogenic memory. IARPA expects that the C3 program will be a five-year two-phase program. Phase one, which is ongoing and encompasses the first three years, primarily serves to develop the technologies that are required to separately demonstrate a small superconducting processor and memory units. Phase two, which is expected to last an additional two years, will integrate those new technologies into a small-scale working model of a superconducting computer. Program goals and metrics are presented along with the main technical challenges and approaches to overcome them.
Abstract: This talk will provide an overview of the status of single flux quantum (SFQ) superconducting circuit fabrication and electronic design automation (EDA) tools. It will include a summary of SFQ fabrication processes around the world, and it will compare and contrast SFQ fabrication with CMOS fabrication. We will describe the IARPA roadmap for SFQ fabrication that is being performed within the Cryogenic Computation Complexity (C3) Program being performed at MIT Lincoln Laboratory.
11:00-11:30 AM: Break
11:30 AM-12:15 PM: Dr. Kenneth Zick, Northrop Grumman Corporation, Architectural Challenges in the Quest for Superconducting Computing
Abstract: Given recent developments in superconducting electronics, we can begin to consider some of the architectural challenges on the road to superconducting computing. In this talk, we will identify and discuss some of these relevant challenges.
12:15-1:15 PM: Lunch and Discussion of Evening Panel Topics
Session 4: Emerging Processor and Memory Architectures
The fourth focus area of this workshop will be emerging processor and memory technologies that can overcome the Von Neumann bottleneck for enhanced data analytics and storage support. The workshop will also focus on technologies such as Memristors, nanocrystal based HPC storage, and 3D processor and memory architectures.
Session Chair: Ms. Janice Elliott, United States Department of Defense
Abstract: With the end of Moore's Law in sight, various technologies and models of computing are being actively pursued as potential replacements. The NSCI is explicitly including beyond CMOS technologies, and the recent OSTP grand challenge presumes neuromorphic computing is an important approach. This talk provides an overview of this emerging model of computing, highlighting existing initiatives, recent accomplishments, and current challenges.
2:00-2:45 PM: Dr. Rao Kosaraju, National Science Foundation, An Overview of the Research Supported by the Division of Computing and Communication Foundations (CCF) at NSF
Abstract: The talk first gives an overview of the research supported by the division of Computing and Communication Foundations. Then it will outline the existing and the new research initiatives to overcome the Moore's Law bottleneck.
2:45-3:15 PM: Afternoon Break and Refreshments
Abstract: The Automata Processor (AP) is an upcoming co-processor which can be programmed to compute a large set of user-defined Nondeterministic Finite Automata (NFA) in parallel against a single query datastream. Defining an NFA to be programmed into the processor is simple, akin to creating classical state diagrams. Once the automata are compiled and programmed into the processor, a query stream can be streamed to the processor. Internally, each byte in the data-stream is broadcast to all the processing elements in the chip such that all paths in an NFA, and all NFAs in the chip can be examined in parallel. Any NFA matched during the streaming is reported along with the offset in the query stream where the match occurred. This provides a greatly simplified programming and execution interface without the need to handle communication delays, race conditions, etc. typical of other contemporary reconfigurable processors. The AP-board is designed to combine the processing capability of 32 AP-chips into a single PCIe-based accelerator board. Cumulatively, it provides the programmable resources to emulate a total of over 1.5 million edge transitions per board. Multiple chips on an AP-board can be combined to function as a single logical core, each processing a single data-stream at 1 Gbps. The AP-board can currently process up to 8 streams in parallel. This number is expected to double in the next generation of hardware. When the processor was publicly unveiled in 2013, it was six years into its making. Today, "engineering samples" have been showcased at the leading conferences running demo-applications based on our previous work. A limited set of "pre-production" AP-boards are being supplied to select research labs, universities, and industrial collaborators now with general engineering availability scheduled for June and full scale manufacturing in the second half of 2016. Also, researchers can obtain and use the AP-SDK for design validation and run-time simulation. A Center for Automata Processing (CAP) has also been set up at the University of Virginia to allow members to access a computing cluster containing AP-boards. Owing to the architecture and the working of the Automata Processor, identifying features in query strings (defined as finite state machines) follows naturally. However, researchers have now developed a variety of core algorithmic techniques which have expanded the scope of usage of the processor significantly. These include the use of regular-expression engines, de-novo motif searches for biological sequences, numerical analysis, machine learning, association rule mining and the development of algorithmic techniques for large scale graph analytics. This talk will briefly discuss the basics of the AP architecture, what algorithms are well suited for the AP and explore some examples of AP algorithm research and implementations.
4:00-4:45 PM: Dr. Samira Khan, University of Virginia, Rethinking Memory and Storage for Future Computing Systems
Abstract: For future HPC systems, we need to rethink and redesign our computing model focusing on minimizing data movement and storage. In this talk, I will present three high-level directions to solve the memory challenges in the future systems focusing on across the stack solutions. First, I will talk about how we can continue DRAM scaling, yet improve reliability, latency, and cost by rethinking the interface between circuits and the system. Second, I will discuss how we can unify the conventional memory and storage hierarchy leveraging the emerging non-volatile memory technologies. Third, I will present my vision on how we need to fundamentally rethink our computing model and redesign processing as near data computation in every level of the system
4:45 PM: Adjourn
6:15-8:00 PM: Panel Discussion and Dinner
The workshop will conclude with a panel discussion to address questions such as (1) what technology is "most likely to succeed" as the next big thing in HPC, (2) how long before it could be in a system doing something useful in HPC, and (3) key issues that need to be resolved for that technology.Moderator: Mr. Steve Pritchard, United States Department of Defense