Training AI models is an extremely time-consuming process. Without proper insight into a feasible alternative to time-consuming development and migration of model training to exploit the power of large, distributed clusters, training projects remain considerably long lasting. To help address these issues, Samsung SDS has developed the Brightics AI Accelerator. The Kubernetes, container-based application, is now available on the NVIDIA NGC Catalog.
The Samsung SDS Brightics AI Accelerator application automates machine learning, speeds up model training and improves model accuracy with key features such as automated feature engineering, model selection, and hyper-parameter tuning without requiring infrastructure development and deployment expertise. Brightics AI Accelerator can be used in many industries such as healthcare, manufacturing, retail, automotive and across different use cases spanning computer vision, natural language processing and more.
Key Features and Benefits:
● Is case agnostic and covers training all AI models by applying autoML to tabular, CSV, time-series, image or natural language data to enable analytics; image classification, detection, and segmentation; and NLP use cases.
● Offers model portability between cloud and on-prem data centers and provides a unified interface for orchestrating large distributed clusters to train deep learning models using Tensorflow, Keras and PyTorch frameworks as well as autoML using SciKit-Learn.
● AutoML software automates and accelerates model training on tabular data by using automated model selection from Scikit-Learn, automated feature synthesis, and hyper-parameter search optimization.
● Automated Deep Learning (AutoDL) software automates and accelerates deep learning model training using data-parallel, distributed synchronous Horovod Ring-All-Reduce Keras, TensorFlow, and PyTorch frameworks with minimal code. AutoDL exploits up to 512 GPUs per training job to produce a model in 1 hour versus 3 weeks using traditional methods.
Figure 1. The amount of time it takes to train one iteration of a ResNet 50 image classification model using a normal 8 GPU machine is about 504 hours. Using the enhanced inter-GPU communication of Brightics AI Accelerator on 128 GPUs speeds-up training time 126x to about 4 hours.
Get started today by pulling Samsung's Brightics AI Accelerator from the NGC Catalog.
Brightics AI Accelerator AutoDL eliminates the installation of any software or configuration per job and offers a painless experience in provisioning, running, monitoring and cleaning up jobs. For computer vision projects, AutoDL shrinks the entire lifecycle from up to 9 months to only a couple weeks in a properly sized cluster using auto-training and grid search based hyper-parameter optimization. AutoDL includes Auto Transfer Learning for image data fitted to all models in the model zoo with a hyper-parameter search. For the natural language processing (NLP) DistilBERT Huggingface benchmark, AutoDL produced super-linear results reducing training time 13x while only increasing GPUs resources 8x.
Figure 2: The amount of time it takes to go through three training epochs for a DistilBERT model is displayed here using several different methods. Using a normal 8 GPU machine, the time is about 400 minutes. Using the enhanced inter-GPU communication of Brightics AI Accelerator lowers this to about 200 minutes on the same hardware already. Going to 32 and 64 GPUs further lowers this to 32 minutes. In total, this represents a speed improvement of 13 times as compared to a linear scaling on ordinary infrastructure.