neuro-Cell: A cognitive computing System-on-Chip

This document provides an introduction to the neuro Cell Technology (nCell). nCell represents a  revolutionary extension of conventional multi-processor architecture. This post discusses the problem, design concept, the architecture and programming models, and the implementation.

Problem
Over the last 20 years digital electronics has evolved and improved exponentially. The performance of devices is roughly doubling every 18 months because transistor size and the cost of chips have shrunk at an impressive pace. Unrelenting advances in the transistor density of integrated circuits have resulted in a large number of engineered systems with diversified functional characteristics to meet various demands of the human life, ranging from micro-embedded devices, implantable medical devices, and smart sensors. However, the complexity of system-level design for these increasingly evolved engineered systems is further compounded when interdisciplinary requirements are included, for example, massive integration and interconnection between components and subsystems, feedback and redundancy.

The increasingly shrinking electronic technology and the VLSI design complexity in these systems have resulted in substantial increases in both the number of hard errors, mainly due to variation, material defects, and physical failure, as well as the number of soft errors primarily due to alpha particles from normal radiation decay, from cosmic rays striking the chip, or simply from random noise. Although these complex systems are designed to guarantee robust operation to the events that have been anticipated and accounted for in the design blueprint, the reality is that most engineered systems still operate under the great risk of uncertainty. A good description of the risk related to modern semiconductor fabrication can be found in the Qualcomm 2011 annual report:
“Our products are inherently complex and may contain defects or errors that are detected only when the products are in use. For example, as our chipset product complexities increase, we are required to migrate to integrated circuit technologies with smaller geometric feature sizes. The design process interface issues are more complex as we enter into these new domains of technology, which adds risk to yields and reliability. Manufacturing, testing, marketing and use of our products and those of our customers and licensees entail the risk of product liability. Because our products and services are responsible for critical functions in our customers’ products and/or networks, security failures, defects or errors in our components, materials or software or those used by our customers could have an adverse impact on us, on our customers and on the end users of their products. Such adverse impact could include product liability claims or recalls, a decrease in demand for connected devices and wireless services, damage to our reputation and to our customer relationships, and other financial liability or harm to our business”.

As a consequence, VLSI designers have increased core counts to sustain Moore’s Law scaling. It is important to understand that Moore’s law is still relevant. In the shift towards multicore processors, transistor numbers are continuing to increase.  The main problem is the fact that it is no longer possible to keep running these transistors at faster speeds. This effect is called “Dennard scalling”, named after Robert Dennard who led the research team from IBM which described this effect in 1974. The key observation by Dennard was that as transistors became smaller the power density was constant, so if there is a reduction in a transistor linear size by 2, the power it uses falls by 4. Therefore, static power losses have increased rapidly as a proportion of overall power supplied as voltages have dropped. And static power losses heat the chip, further increasing static power loss and threatening thermal runaway.

The failure of Dennard scaling, to which the shift to multicore parts is partially a response, may soon limit multicore scaling as well.  Adding more cores will not provide sufficient benefit to justify continued process scaling. If multicore scaling ceases to be the primary driver of performance gains at 16nm the “multicore era” will have lasted less than a decade. Clearly, architectures that move well beyond the energy/performance of today’s designs will be necessary.

Radical or even incremental ideas simply cannot be developed along typical academic research and industry product cycles. We may hit a “transistor utility economics” wall in few years, at which point Moore’s Law may end, creating massive disruptions in our industry. However, there is a great opportunity for a new generation of computer architects, to deliver performance and efficiency gains that can work across a wide range of problems. It is clear now that the energy efficiency of devices is not scaling along with integration capacity, and the big opportunity today for the semiconductor industry is “Big Data” and emerging domains such as recognition, mining, and synthesis that can efficiently use a many-core chip. The objective of processing “Big Data” is to transform data to knowledge; the probabilistic nature of “Big Data” opens new frontiers and opportunities.

To ensure the appropriate operation of complex semiconductor devices under highly unreliable circumstances, a new paradigm for architectural design, analysis and synthesis of engineered semiconductor systems is needed. It is therefore imperative that VLSI designers build robust and novel fault-tolerance into computational circuits, and that these designs have the ability to detect and recover the damages causing the system to process improperly or even disabled. Furthermore, these new generations of processors feature standard compilers and a new programmability paradigm that exploit machine learning operations for better performance and energy efficiency. Thus, our approach is an algorithmic transformation converting diverse regions of code from a Von Neumann model to a cognitive computing model, enabling much larger performance and efficiency gains.

nCell Technology: Revolutionizing Computing
 
We recently introduced cognitive computing SoC architecture (pending patent), the idea of a self-reconfigurable machine learning processor with an innovative mechanism to ensuring appropriate operational levels during and after unexpected natural or man-made events that could impact critical engineered systems in unforeseen ways or to take advantage of unexpected opportunities. Self-reconfigurable cognitive computing processors refer to a system’s ability to change its structure and operations or both in response to an unforeseen event in order to meet its objectives. This technology is realized and advanced using a powerful state-of-the-art computational platform and techniques, including innovative hardware architecture, software, networks, and adaptive computation, which can provide the capability for embedding machine learning reconfigurability into complex engineered semiconductor devices. This technology will not only improve the computational efficiency of software operations, but it will provide a robustness that results in higher yields from a semiconductor manufacturing.

                                                     nCellarchpic

Figure. 1. nCell next generation of intelligent computing machines (ICM), which are self-programmable, high performance, embedded cognitive computing processors. Through this revolutionary computing architecture computers will intelligently understand data generating countless revolutionary advances in human knowledge and man-machine interaction.

To software/platform developers we are introducing a novel yet simple concept for massively parallel processor programing: The nCell transformation. The nCell transformation is an algorithmic transformation that converts regions of imperative code into cognitive computing cellular operation. Because nCell exposes considerable parallelism and consists of simple operations, it can be efficiently used as an on chip hardware accelerator. Therefore, the nCell transformation can yield significant performance and energy improvements.

The transformation uses a training-based approach to produce a cognitive computing model that approximates the behavior of candidate code. A transformed program runs primarily on the main core and invokes the nCell structure, the cognitive processing cellular structure, to perform machine learning evaluation instead of executing the replaced code using conventional computation mode.
This transformation structurally and semantically changes the code and replaces various unstructured regions of code with structured fine-grained parallel computation (nCell).
Reducing the pressure of providing perfect conventional arithmetic operations at the device, circuit, architecture, and programming language levels can provide significant opportunities to improve performance and energy efficiency for the domains in which applications require cognitive computing. This is possible because of the evolution of the internet of things, machine to machine, social networks, and the subsequent generation of big data which is central to the future of data centers.

Services such as Google, Youtube,  Spotify and Netflix will be enhanced as they run intensive recommendation engines personalized for each user. Recognition systems have similar machine-learning requirements, with examples such as Apple’s Siri, Shazam and Google Glass’ voice and visual recognition system. 1000eyes is another example which uses image clustering, essentially searching for products based on a photo a user submits. Data-mining in a sea of Big Data such as Facebook’s repository will also benefit from cognitive computing capability, as will complex control systems used in driverless vehicles. Machine-learning enabled network security is also an area of huge importance.

At the architecture level, we are introducing a machine-learning instruction set that allows conventional Von-Neumann processors to transform precise operations to machine learning operations. Allowing the compiler to convey what can be computed by nCell without necessary specifying how, letting the microarchitecture choose from a set of machine learning operations without exposing them to software.

compilerpic
Figure 2. From annotated code to accelerated execution on a cognitive computing-unit (nCell) augmented core. The nCell transformation has three phases: programming, in which the programmer marks code regions to be transformed; compilation, in which the compiler selects and trains a suitable machine learning operation and replaces the original code with a machine learning invocation; and execution.

nCell transform is a general and exact technique for parallel programming of a large class of machine learning algorithms for multicore processors. The central idea of this approach is to allow a future programmer to implement machine learning applications by using nCell generic structure rather than search for specialized optimizations. This form does not change the underlying algorithm and so it is not an approximation, but is instead an exact implementation.

Conclusion

Many in the computing community believe that exponential core scaling will continue into the hundreds or thousands of cores per chip, auguring a parallelism revolution in hardware or software. However, while transistor count-increases continue at traditional Moore’s Law rates, the per-transistor speed and energy efficiency improvements have slowed dramatically. Under these conditions, more cores are only possible if the cores are slower, simpler, or less utilized with each additional technology generation. VIMOC Technologies introduced a disruptive innovation in a form of IP silicon processor architecture to enable the next generation cognitive computing SoC improving performance and energy efficiency. Furthermore, this technology will offer significant advantages to build advanced micro-servers and mobile computing to form low cost cognitive computing datacenter infrastructure and enable high level of security, reliability and big data driven innovative applications.

The heart of the Big Data challenge

Data is fast becoming the most important economic commodity in the 21st century. Unlike finite resources, data can be created from nothing, which explains why there has been such an explosion in data in the past two decades. The trend towards free digital products and services has also promoted monetization through the sale of the data that is generated by these products. This has provided a catalyst for the generation of even more data. It is evident that the real value in this infinite resource is the extraction of quality intelligence.

Data CentreFrom an engineering perspective a number of new technical challenges have emerged as a result of this progression. One is related to the sheer volume of data to be stored and processed. A number of groups are working on new developments in high-density storage to accommodate these trends. The transmission of this data from user to data-center also presents challenges in the speed and bandwidth of telecommunications infrastructure. The third challenge-space involves the analysis and extraction of intelligent information from this data. Machine learning algorithms have become key in analyzing the copious amounts of data and creating value from it. The algorithms operate above several layers of software, making the process inefficient. Under the layers of software lays the heart of current limitations- the processor. Processors used in today’s data-centers are the fastest they have ever been, as described by Moore’s law. Every 18 months for the past 50 years speeds have doubled and size halved, however the architecture of the chips has remained the same. The laws of physics are now limiting how much smaller we can go, and Moore’s law is facing a brick wall.

One constant which has remained throughout this period of rapid progress is the architecture of processors. The architecture of current processors is deterministic and hierarchical. Interaction between machine learning software and processors in the cloud has resulted in software complexity and therefore security issues. The architecture of current processors also makes scalability and multi-core interaction counter-intuitive, as well as power-hungry and large. In order to handle the rapid expansion of data volume, data-centers need to be enabled with a high-speed low-energy, smaller yet scalable processor. A fundamental change in processor architecture is essential.

VIMOC Technologies is re-designing the processor by re-thinking how it is fundamentally structured. The aim is to build a bio-inspired chip that is intuitive with machine learning algorithms. Rather than using several layers of software, this processor will conduct machine-learning operations at the hardware level.

Revolutionizing Data Centres

VIMOC’s Cognitive-Core Technology enables a revolutionary architecture for hyper-scale computing environments such as Cloud Storage, Analytics, Webserving and Media streaming. VIMOC’s unique combination of ultra low power ARM processors and proprietary neuro-Cell (nCell) technology sets the foundation for the next generation of cognitive computing server designs. VIMOC’s technology is designed to sustain large scale applications with dramatic savings in power and space compared to today’s state of the art installations, and can provide software developers with an efficient and flexible platform to implement advanced machine learning algorithms.

VIMOC Technologies’ products include IP silicon Cognitive-Core processors and hardware-software platforms, which allow OEMs to bring ultra-efficient, hyper-scale solutions to market with a great level of efficiency and speed.