This document provides an introduction to the neuro Cell Technology (nCell). nCell represents a revolutionary extension of conventional multi-processor architecture. This post discusses the problem, design concept, the architecture and programming models, and the implementation.
Over the last 20 years digital electronics has evolved and improved exponentially. The performance of devices is roughly doubling every 18 months because transistor size and the cost of chips have shrunk at an impressive pace. Unrelenting advances in the transistor density of integrated circuits have resulted in a large number of engineered systems with diversified functional characteristics to meet various demands of the human life, ranging from micro-embedded devices, implantable medical devices, and smart sensors. However, the complexity of system-level design for these increasingly evolved engineered systems is further compounded when interdisciplinary requirements are included, for example, massive integration and interconnection between components and subsystems, feedback and redundancy.
The increasingly shrinking electronic technology and the VLSI design complexity in these systems have resulted in substantial increases in both the number of hard errors, mainly due to variation, material defects, and physical failure, as well as the number of soft errors primarily due to alpha particles from normal radiation decay, from cosmic rays striking the chip, or simply from random noise. Although these complex systems are designed to guarantee robust operation to the events that have been anticipated and accounted for in the design blueprint, the reality is that most engineered systems still operate under the great risk of uncertainty. A good description of the risk related to modern semiconductor fabrication can be found in the Qualcomm 2011 annual report:
“Our products are inherently complex and may contain defects or errors that are detected only when the products are in use. For example, as our chipset product complexities increase, we are required to migrate to integrated circuit technologies with smaller geometric feature sizes. The design process interface issues are more complex as we enter into these new domains of technology, which adds risk to yields and reliability. Manufacturing, testing, marketing and use of our products and those of our customers and licensees entail the risk of product liability. Because our products and services are responsible for critical functions in our customers’ products and/or networks, security failures, defects or errors in our components, materials or software or those used by our customers could have an adverse impact on us, on our customers and on the end users of their products. Such adverse impact could include product liability claims or recalls, a decrease in demand for connected devices and wireless services, damage to our reputation and to our customer relationships, and other financial liability or harm to our business”.
As a consequence, VLSI designers have increased core counts to sustain Moore’s Law scaling. It is important to understand that Moore’s law is still relevant. In the shift towards multicore processors, transistor numbers are continuing to increase. The main problem is the fact that it is no longer possible to keep running these transistors at faster speeds. This effect is called “Dennard scalling”, named after Robert Dennard who led the research team from IBM which described this effect in 1974. The key observation by Dennard was that as transistors became smaller the power density was constant, so if there is a reduction in a transistor linear size by 2, the power it uses falls by 4. Therefore, static power losses have increased rapidly as a proportion of overall power supplied as voltages have dropped. And static power losses heat the chip, further increasing static power loss and threatening thermal runaway.
The failure of Dennard scaling, to which the shift to multicore parts is partially a response, may soon limit multicore scaling as well. Adding more cores will not provide sufficient benefit to justify continued process scaling. If multicore scaling ceases to be the primary driver of performance gains at 16nm the “multicore era” will have lasted less than a decade. Clearly, architectures that move well beyond the energy/performance of today’s designs will be necessary.
Radical or even incremental ideas simply cannot be developed along typical academic research and industry product cycles. We may hit a “transistor utility economics” wall in few years, at which point Moore’s Law may end, creating massive disruptions in our industry. However, there is a great opportunity for a new generation of computer architects, to deliver performance and efficiency gains that can work across a wide range of problems. It is clear now that the energy efficiency of devices is not scaling along with integration capacity, and the big opportunity today for the semiconductor industry is “Big Data” and emerging domains such as recognition, mining, and synthesis that can efficiently use a many-core chip. The objective of processing “Big Data” is to transform data to knowledge; the probabilistic nature of “Big Data” opens new frontiers and opportunities.
To ensure the appropriate operation of complex semiconductor devices under highly unreliable circumstances, a new paradigm for architectural design, analysis and synthesis of engineered semiconductor systems is needed. It is therefore imperative that VLSI designers build robust and novel fault-tolerance into computational circuits, and that these designs have the ability to detect and recover the damages causing the system to process improperly or even disabled. Furthermore, these new generations of processors feature standard compilers and a new programmability paradigm that exploit machine learning operations for better performance and energy efficiency. Thus, our approach is an algorithmic transformation converting diverse regions of code from a Von Neumann model to a cognitive computing model, enabling much larger performance and efficiency gains.
nCell Technology: Revolutionizing Computing
We recently introduced cognitive computing SoC architecture (pending patent), the idea of a self-reconfigurable machine learning processor with an innovative mechanism to ensuring appropriate operational levels during and after unexpected natural or man-made events that could impact critical engineered systems in unforeseen ways or to take advantage of unexpected opportunities. Self-reconfigurable cognitive computing processors refer to a system’s ability to change its structure and operations or both in response to an unforeseen event in order to meet its objectives. This technology is realized and advanced using a powerful state-of-the-art computational platform and techniques, including innovative hardware architecture, software, networks, and adaptive computation, which can provide the capability for embedding machine learning reconfigurability into complex engineered semiconductor devices. This technology will not only improve the computational efficiency of software operations, but it will provide a robustness that results in higher yields from a semiconductor manufacturing.
Figure. 1. nCell next generation of intelligent computing machines (ICM), which are self-programmable, high performance, embedded cognitive computing processors. Through this revolutionary computing architecture computers will intelligently understand data generating countless revolutionary advances in human knowledge and man-machine interaction.
To software/platform developers we are introducing a novel yet simple concept for massively parallel processor programing: The nCell transformation. The nCell transformation is an algorithmic transformation that converts regions of imperative code into cognitive computing cellular operation. Because nCell exposes considerable parallelism and consists of simple operations, it can be efficiently used as an on chip hardware accelerator. Therefore, the nCell transformation can yield significant performance and energy improvements.
The transformation uses a training-based approach to produce a cognitive computing model that approximates the behavior of candidate code. A transformed program runs primarily on the main core and invokes the nCell structure, the cognitive processing cellular structure, to perform machine learning evaluation instead of executing the replaced code using conventional computation mode.
This transformation structurally and semantically changes the code and replaces various unstructured regions of code with structured fine-grained parallel computation (nCell).
Reducing the pressure of providing perfect conventional arithmetic operations at the device, circuit, architecture, and programming language levels can provide significant opportunities to improve performance and energy efficiency for the domains in which applications require cognitive computing. This is possible because of the evolution of the internet of things, machine to machine, social networks, and the subsequent generation of big data which is central to the future of data centers.
Services such as Google, Youtube, Spotify and Netflix will be enhanced as they run intensive recommendation engines personalized for each user. Recognition systems have similar machine-learning requirements, with examples such as Apple’s Siri, Shazam and Google Glass’ voice and visual recognition system. 1000eyes is another example which uses image clustering, essentially searching for products based on a photo a user submits. Data-mining in a sea of Big Data such as Facebook’s repository will also benefit from cognitive computing capability, as will complex control systems used in driverless vehicles. Machine-learning enabled network security is also an area of huge importance.
At the architecture level, we are introducing a machine-learning instruction set that allows conventional Von-Neumann processors to transform precise operations to machine learning operations. Allowing the compiler to convey what can be computed by nCell without necessary specifying how, letting the microarchitecture choose from a set of machine learning operations without exposing them to software.
Figure 2. From annotated code to accelerated execution on a cognitive computing-unit (nCell) augmented core. The nCell transformation has three phases: programming, in which the programmer marks code regions to be transformed; compilation, in which the compiler selects and trains a suitable machine learning operation and replaces the original code with a machine learning invocation; and execution.
nCell transform is a general and exact technique for parallel programming of a large class of machine learning algorithms for multicore processors. The central idea of this approach is to allow a future programmer to implement machine learning applications by using nCell generic structure rather than search for specialized optimizations. This form does not change the underlying algorithm and so it is not an approximation, but is instead an exact implementation.
Many in the computing community believe that exponential core scaling will continue into the hundreds or thousands of cores per chip, auguring a parallelism revolution in hardware or software. However, while transistor count-increases continue at traditional Moore’s Law rates, the per-transistor speed and energy efficiency improvements have slowed dramatically. Under these conditions, more cores are only possible if the cores are slower, simpler, or less utilized with each additional technology generation. VIMOC Technologies introduced a disruptive innovation in a form of IP silicon processor architecture to enable the next generation cognitive computing SoC improving performance and energy efficiency. Furthermore, this technology will offer significant advantages to build advanced micro-servers and mobile computing to form low cost cognitive computing datacenter infrastructure and enable high level of security, reliability and big data driven innovative applications.