Cooperation with Intel®: Quantization of ML-Models and Performance-Boost in Pre-Processing
scieneers are AI Specialist Partner of the semiconductor manufacturer Intel®. We test in real-world deployment scenarios how Intel’s latest technologies and tools can further enhance the performance of analytical models and computations on large data sets.
Real-time and streaming applications are becoming increasingly popular due to their enormous, cross-domain application potential for generating company-wide added value. Increasing core requirements are primarily defined by the pursuit of ever lower latencies and computing times, while at the same time increasing application complexity. Horizontal and vertical scaling of the resources in use is a common solution. However, scaling of resources is inevitably accompanied by increasing costs for operation and provision, which rarely underlies a linear relationship. In addition to increasing costs, the growing number and use of more resources also has a negative effect on the sustainability of every company. Of particular concern is the ever-increasing demand for energy associated with the production and operation of such computing resources. In particular, the use of ML models from the Deep Learning sub-area, which allow the mapping of such complexity of data-driven real-time applications, requires enormously powerful resources in order to satisfy the disproportionately greater computing effort and times for pre-processing, training and inference. With the AI Analytics Toolkit and the Neural Compressor, Intel® provides two technologies to address the problem described.
With the AI Analytics Toolkit and the Neural Compressor, Intel® provides two technologies to address the problem described.
“The Intel® AI Analytics Toolkit provides data scientists and AI developers with familiar Python tools to accelerate execution of end-to-end ML pipelines on Intel architectures. The toolkit thus enables an increase in performance from preprocessing to machine learning and offers interoperability for efficient model development.”
“With the Intel® Neural Compressor, Intel provides an optimization tool as an open-source Python library with functionalities for network compression, such as quantization and pruning, as well as for knowledge distillation. The tooling has uniform interfaces to all relevant deep learning frameworks and can be run on Intel CPUs and GPUs.”
As part of our cooperation with Intel®, we were able to extensively test the presented technologies based on the services of the Microsoft Azure Cloud. A first iteration took place in close cooperation with our customer STEAG New Energies, which as an established energy service provider, is already successfully using ML applications in production across the value chain. Among other models, complex neural networks are used to realize heat demand forecasts in order to generate added value on the energy market as well as by optimizing the operation of systems and thus do justice to more sustainable energy production. This involves the use of powerful CPU and GPU resources for data pre-processing, training the ML models and generating those forecasts in live operation. More details here.
By using the Intel® AI Analytics Toolkit, it was possible to measurably reduce the processor time (CPU time) during data pre-processing. Enormously impressive successes were achieved by using the Neural Compressor. Post-training quantization (compression) of the model made it possible to reduce the precision of the model parameters in such a way that the model required a factor 4 less memory. Likewise, network latency/inference time was reduced by a factor of 3 while maintaining model accuracy. The results clearly demonstrate the value of model quantization as well as a direct, positive effect on the business context by reducing the use of costly resources while maintaining model accuracy and improving performance.
Sven Weisse
Account Executive Industry Partner bei Intel Corporation
“It’s impressive to see the level of expertise, the clear customer dedication, and the willingness to improve your solutions to achieve the highest level of business impact for the customer. Working with your engineering teams was smooth, fast, trustful, and open. Testing new ideas and applying optimized solutions has happened in hours.”
As a service provider and developer of data-driven applications, we look forward continuing increasing the added value for our customers in cooperation with Intel®. Especially with regard to machine learning operations, we are looking forward to the quantization of larger models like transformer architectures currently indispensable in image and natural language processing, in order to further expand the field report in dealing with and using Intel’s® AI Toolkit and Neural Compressor.
Martin Danner & Stefan Kirner