How containerization brings AI to your DevOps pipeline

public://webform/writeforus/profile-pictures/james_kobielus_wikibon_official_photo.png
James Kobielus, Lead Analyst, SiliconANGLE Wikibon

Developing applications for the cloud today means building containerized microservices. And increasingly, artificial intelligence (AI)—grounded in machine-learning (ML) models—is at the core of those cloud applications.

Development tool vendors have recognized the need to build and deploy containerized AI/ML models within cloud applications. They have responded by building in support for containerization—specifically, within Docker images that are orchestrated via Kubernetes. They also support programming these applications using languages such as Python, Java, and R.

Here's what application developers and IT professionals need to understand about what AI is, how it relates to DevOps, and how containerization enables DevOps pipelines to deploy AI apps into cloud-computing environments. Plus, I review one new open source tool, Kubeflow, and discuss how to integrate AI-based DevOps tools and projects into existing continuous integration/continuous deployment (CI/CD) environments.

Multicloud Monitoring: How to Ensure Success the First Time

Why you need AI in your DevOps pipelines

AI is the heart of modern applications, and data scientists are pivotal developers in this new world. More developers have begun to incorporate AI—sometimes known as ML or deep learning (DL)—into their cloud services initiatives.

AI is all about using artificial neural network algorithms to infer correlations and patterns in datasets. When incorporated into statistical models and used to automate the distillation of insights from big data, AI can achieve impressive results. Common applications include predictive analysis, e-commerce recommendation engines, embedded mobile chatbots, automated face recognition, image-based search, and others.

AI adoption can get complicated

To be effective at their intended tasks, AI-infused cloud apps require more than just the right data to build and train these models. Any organization that hopes to harness AI must also have developers who have mastered the tools and the skills of data science. In addition, a sustainable AI development practice requires adoption of arcane methodologies, high-performance computing clusters, and complex workflows into enterprise development practices.

Increasingly, enterprises are aligning their AI development practices with their existing enterprise DevOps methodologies. This enables AI models to be built, deployed, and iterated in the same CI/CD environment as the program code, application programming interfaces (APIs), user experience designs, and other application artifacts.

Within DevOps processes, data scientists are the ones who build, train, and test AI models against actual data in the application domain of interest. This ensures that the resulting applications are fit for the purposes for which they have been built.

AI developers also must keep re-evaluating and retraining their models against fresh data over an application’s life. This ensures that the models can continue to do their jobs—such as recognizing faces, predicting events, and inferring customer intentions—with acceptable accuracy.

The tools you'll need

Within a CI/CD practice, the DevOps environment should automate AI pipeline activities to the maximum extent possible throughout the application lifecycle. This requires an investment in several critical platforms and tools.

Source-control repository 

This is where you store, manage, and control all models, code, and other AI pipeline artifacts through every step of the DevOps lifecycle. The repository serves as the hub for collaboration, reuse, and sharing of all pipeline artifacts by all involved development and operations professionals.

Data lake 

This is where you store, aggregate, and prepare data for use in exploration, modeling, and training throughout the AI DevOps pipeline. Typically, the data lake is a distributed file system, such as Hadoop, that stores multi-structured data in its original formats to facilitate data exploration, modeling, and training by AI developers.

Integrated collaboration environment 

This is the workbench in which AI DevOps professionals execute all or most pipeline functions. It provides a unified platform for source discovery, visualization, exploration, data preparation, statistical modeling, training, deployment, evaluation, sharing, and reuse. Most embed popular AI modeling frameworks such as TensorFlow, Caffe, PyTorch, and Mxnet. 

[ Webinar: What’s New in Network Operations Management (Dec. 11) ]

Adopt containerization in your AI development 

Developing AI applications for the cloud requires building this functionality into containerized microservices. This involves using Python, Java, and other languages to incorporate AI and other application logic into Docker images that can be orchestrated via Kubernetes or other cloud-services orchestration backbones.

To effectively develop AI microservices, developers must factor the underlying application capabilities into modular building blocks that can be deployed into cloud-native environments with minimal binding among resources. In a cloud services environment, you containerize and orchestrate AI microservices dynamically within lightweight interoperability fabrics.

Typically, each containerized AI microservice exposes an independent, programmable API, which enables you to easily reuse, evolve, or replace it without compromising interoperability. Each containerized AI microservice may be implemented using different programming languages, algorithm libraries, cloud databases, and other enabling back-end infrastructure.

AI DevOps tools are coming to market in droves

To address these requirements in repeatable DevOps pipelines, enterprise development teams are adopting a new generation of data science development workbenches. These incorporate CI/CD functionality and integrate with existing enterprise investments in big data platforms, high-performance computing clusters, low-code tools, and other essential infrastructure.

Commercial AI DevOps tools come from public cloud providers, including Alibaba Cloud, Amazon Web Services, Microsoft, Google, IBM, and Oracle. AI tools are also available from established big data analytics solution vendors, including Alteryx, Cloudera, Databricks, KNIME, MapR, Micro Focus, Nvidia, RapidMiner, and SAS Institute.

Also, there are a wide range of specialized startups in this market segment, including Agile Stacks, Anaconda, Dataiku, DataKitchen, DataRobot, Domino Data Lab, H2O.ai, Hydrosphere.io, Kogentix, ParallelM, Pipeline.ai, PurePredictive, Seldon, Tellmeplus, Weaveworks, and Xpanse AI.

Kubeflow's place in this world

Increasingly, the tools provide the ability to deploy containerized AI microservices over Kubernetes orchestration backbones that span public, private, hybrid, multi-cloud, and even edge environments. 

Recognizing the need for standards in this regard, the AI community has in the past year coalesced around an open-source project that automates the AI DevOps pipeline over Kubernetes clusters. Developed by Google and launched in late 2017, Kubeflow provides a framework-agnostic pipeline for making AI microservices production-ready across multi-framework, multi-cloud computing environments.

Kubeflow supports the entire DevOps lifecycle for containerized AI. It simplifies the creation of production-ready AI microservices, ensures the mobility of containerized AI apps among Kubernetes clusters, and supports scaling of AI DevOps workloads to any cluster size.

It's designed to support any workload in the end-to-end AI DevOps pipeline, ranging from up-front data preparation to iterative modeling and training, all the way to downstream serving, evaluation, and management of containerized AI microservices.

But Kubeflow is far from mature and has been adopted only in a handful of commercial AI workbench and DevOps product offerings. Early adopters of Kubeflow include Agile Stacks, Alibaba Cloud, Amazon Web Services, Google, H20.ai, IBM, NVIDIA, and Weaveworks.

How to get started

Developing AI apps for containers in the cloud requires expert personnel, sophisticated tooling, scalable cloud platforms, and efficient DevOps workflows. To recap, enterprise application development and operations professionals who want to bring AI development fully into their cloud-computing initiatives should heed the following advice:

  • Align your AI application development practices with your existing enterprise DevOps methodologies. This will allow you to build, deploy and iterate ML, DL, and other statistical models in the same APIs programming interfaces, user experience designs, and other application artifacts.
  • Provide AI application DevOps teams with a shared collaboration workbench. This will allow for data preparation, statistical modeling, training, deployment, and the refinement of models, code, APIs, containerized microservices, and other development artifacts.
  • Ensure that your AI DevOps workflow supports continuous retraining of deployed AI models against fresh data over an application's life. This will ensure that AI-infused applications continue to do their designated tasks—such as recognizing faces, predicting events, and inferring customer intents—with acceptable accuracy.
  • Manage AI DevOps workflows from a source control repository. This will serve as the hub for collaboration, versioning, reuse, and sharing of all pipeline artifacts by all participants.

Most important of all, bring data scientists fully into your application development organizations and DevOps practices. They are skilled professionals who have the expertise to build, train, test, deploy, and manage AI models that are anchored in the actual data in the application domains of interest.