Deep learning, the current craze, has its roots in the era of vacuum tube computers. In 1958, Frank Rosenblat of Cornell University designed the first artificial neural network inspired by neurons in the brain, which was named “deep learning”. Rosenblatt knew the technology was ahead of computing power at the time and lamented: “As the number of connected nodes in neural networks increased… “Traditional digital computers will soon become overwhelmed by the amount of computing they can do.”
Fortunately, computer hardware has upgraded rapidly over the past few decades, making computing roughly 10 million times faster. As a result, researchers in the 21st century have been able to implement neural networks with more connections to simulate more complex phenomena. Deep learning is now widely used in many fields, including playing go, translation, predicting protein folding, and analyzing medical images. The rise of deep learning is unstoppable, but its future is likely to be bumpy. The computing limits that Rosenblat worries about remain a dark cloud over the field of deep learning. Today, deep learning is pushing the limits of computing tools.
Huge computing costs
A rule that applies to all statistical models is that to improve performance by a factor of K, at least k2 times as much data is needed to train the model. And because of the over-parameterization of the deep learning model, increasing performance by k times will require at least K4 times of computation. A “4” in the index means that a 10,000-fold increase in computation can result in up to a 10-fold improvement. Clearly, to improve the performance of deep learning models, scientists need to build larger models and train them with more data. But how expensive will computing become? Is it going to be so high that we can’t afford it and therefore hinder the development of the field?
To explore this question, scientists at the Massachusetts Institute of Technology collected data from more than 1,000 deep learning research papers on topics such as image classification, object detection, question answering systems, named entity recognition, and machine translation. Their research shows that deep learning is facing serious challenges. “If you can’t improve performance without increasing the computational burden, the computational limitations will bring deep learning to a standstill.” Are chip performance improvements keeping pace with deep learning? Don’t. Of nasNet-A’s 1,000-plus computing increase, only six times that increase came from better hardware, with the rest coming from using more processors or running longer, with higher costs. Theory tells us that improving performance by a factor of K requires a factor of K4 in computation, but in practice the increase is at least a factor of K9.
According to the researchers’ estimates of the computational cost-performance curve for image recognition, getting the error rate down to 5% would require 1,028 floating point operations. Another study, from the University of Massachusetts Amherst in the US, shows the huge economic and environmental costs implied by computational burden: training an image recognition model with an error rate of less than 5 per cent would cost $100bn and consume the same amount of electricity to produce carbon emissions as New York City does in a month. Training an image recognition model with an error rate of less than 1% is prohibitively expensive.
The burden of computational costs has become apparent on the frontiers of deep learning. OpenAI, a machine learning think tank, spent more than $4 million to design and train gpT-3, a deep-learning language system. Although the researchers made a mistake in the operation, they did not fix it, simply explaining in an appendix to the paper that “retraining the model is not practical due to the high cost of training.”
Companies are also starting to shy away from the computational costs of deep learning. A major European supermarket chain recently abandoned a system based on deep learning to predict which products would be bought. The company’s executives judged the cost of training and running the system too high.
Where is deep learning going
In the face of rising economic and environmental costs, the field of deep learning urgently needs methods to improve performance under the premise of controllable computation. Researchers have done a lot of research on this.
One strategy is to use processors designed specifically for deep learning. Over the past decade, cpus have given way to Gpus, field-programmable gate arrays, and integrated circuits for specific programs. These methods improve the efficiency of specialization, but sacrifice generality and face diminishing returns. In the long run, we may need entirely new hardware frameworks. Another strategy to reduce the computational burden is to use smaller neural networks. This strategy reduces the cost per use, but often increases the cost of training. The trade-off depends on the circumstances. For example, models that are widely used should prioritize the high cost of use, while models that require constant training should prioritize the cost of training. Meta-learning is expected to reduce the training cost of deep learning. The idea is that a systematic learning effort can be applied to multiple areas. For example, rather than building separate systems to identify dogs, cats and cars, train one recognition system and use it multiple times. However, it is found that once the original data is slightly different from the actual application scenario, the performance of the meta-learning system will be severely reduced. Therefore, a comprehensive meta-learning system may require a huge amount of data.
Some undiscovered or underappreciated types of machine learning may also reduce computation. For example, machine learning systems based on expert insights are more efficient, but if the expert can’t identify all the contributing factors, such systems won’t be as good as deep learning systems. Technologies such as neural notations, which are still being developed, promise to better combine the knowledge of human experts with the reasoning power of neural networks. Just as Rosenblat felt in the early days of neural networks, today’s deep learning researchers are beginning to face the limitations of their computational tools. If we cannot change the way we approach deep learning, we must face a future of slow progress in this field under the dual pressures of economy and environment. We expect a breakthrough in algorithms or hardware that will allow flexible and powerful deep learning models to continue to evolve and become available to us.
(By Zheng Yuhong, Global Science)