Ibrar Aziz -Technology Enthusiast

Posts

Your GPU cluster is running a hotel. Most engineers are still managing it like a house rental

Your GPU cluster is running a hotel. Most engineers are still managing it like a house rental. Here's a stat that should bother every platform architect: enterprise GPU utilization commonly sits at 20-30%. Not because demand is low — because the allocation model underneath most AI platforms was built for a world where one workload gets one whole GPU, full stop. Red Hat OpenShift AI just closed that gap architecturally. Dynamic Resource Allocation (DRA) went GA in OpenShift 4.21, and it quietly rewrites how GPUs get shared across a cluster. If you're designing AI infrastructure in 2026, this is the shift to understand. The old model: renting the whole house For years, Kubernetes handled GPUs through the Device Plugin framework. A GPU request was just an integer: "give me 1 GPU." No nuance about size, memory, or whether a fraction would do. To get any sharing at all, teams used static MIG (Multi-Instance GPU) configuration — an admin pre-carves a physical GPU i...

Building Enterprise AI Agents on Red Hat OpenShift: From Prototype to Production

Everyone is building AI demos. The real challenge is building AI agents that are secure, scalable, and production-ready . This is where Red Hat OpenShift AI and Kubernetes provide a significant advantage. Instead of treating AI as a standalone application, OpenShift enables AI agents to run as cloud-native workloads with enterprise-grade security, automation, and observability. How to Build an AI Agent on OpenShift 1. Select the Foundation Model Choose an LLM such as Llama, Mistral, Granite, or another enterprise model, and deploy it using OpenShift AI model serving. 2. Create the AI Agent Use frameworks like LangGraph, LangChain, CrewAI, or Semantic Kernel to define the agent's reasoning, memory, and workflow. 3. Connect Enterprise Data Integrate the agent with: Internal APIs Databases Vector databases for RAG Document repositories Knowledge bases This allows the agent to answer using your organization's data rather than relying only on pretrained knowledge. 4. Containerize t...

OpenShift AI: Building an Enterprise AI Platform, Not Just Running Models

Many organizations begin their AI journey by deploying notebooks or running a few models on GPUs. While this may work for experimentation, enterprise AI requires a platform that is secure, scalable, governed, and repeatable. This is where OpenShift AI changes the conversation. Rather than treating AI as isolated workloads, OpenShift AI integrates data science, model training, model serving, governance, and MLOps into a unified Kubernetes-native platform. Why OpenShift AI? An enterprise AI platform must support multiple teams, projects, and environments without sacrificing security or operational control. OpenShift AI provides: Collaborative data science workbenches GPU-enabled model training Scalable model serving Integration with CI/CD pipelines Multi-user isolation Enterprise security and RBAC Monitoring and lifecycle management This allows organizations to move from isolated AI experiments to production-ready AI services. Key Prerequisites A successful OpenShift AI deployment ...

Scaling AI Infrastructure on OpenShift: Building More Than Just a GPU Cluster

As organizations race to adopt AI, many focus on acquiring the latest GPUs. But in practice, successful AI platforms are built on much more than powerful hardware. Scaling AI infrastructure requires treating GPUs as a shared, cloud-native resource—managed with the same discipline as compute, storage, and networking. Platforms such as OpenShift enable this transformation by providing orchestration, security, and lifecycle management for enterprise AI workloads. 1. Start with the Right Foundation Before deploying a single AI workload, validate the infrastructure: GPU architecture (H100, Blackwell, etc.) High-core CPU and adequate system memory High-speed networking (25/100/200/400 GbE or InfiniBand where applicable) Fast NVMe storage for datasets and model checkpoints Kubernetes/OpenShift version compatibility Supported NVIDIA driver, CUDA, and GPU Operator versions A mismatch between hardware, drivers, and Kubernetes versions often becomes the biggest deployment challenge—not the...

TKGM PR-DR SITE ON VCLOUD DIRECTOR ARCHITECURE

TKGM PR-DR SITE ON VCLOUD DIRECTOR ARCHITECURE You build: vSphere + vCD + NSX-T + CSE on both sites. You deploy TKGm clusters on primary. You set up Velero to back up YAMLs and volumes. You mirror Harbor registry to DR. You test restoring a cluster on DR site using CSE + Velero. You prepare DNS (manual or automated) to point to DR when needed. Primary & DR Site Layer Comparison Table Layer Component Primary Site DR Site What Happens During DR? Notes / Tools 1️⃣ Infrastructure vSphere (ESXi, vCenter) Same setup DR vSphere takes over Ensure hardware compatibility 2️⃣ Networking NSX-T Same NSX-T setup DR NSX routes traffic Replicate NSX segments, edge configs 3️⃣ Cloud Management vCloud Director vCloud Director DR vCD deploys new VMs Must sync templates across sites 4️⃣ K8s Provisioning CSE (TKGM enabled) CSE (same version) DR CSE deploys TKGm cluster Sync catalog/templates 5️⃣ Kubernetes Cluster TKGm Cluster (Running) TKGm Cluster (Rebuilt) Apps are restore...

Managing AI Workloads in Kubernetes and OpenShift with Modern GPUs [H100/H200 Nvidia]

AI workloads demand significant computational resources, especially for training large models or performing real-time inference. Modern GPUs like NVIDIA's H100 and H200 are designed to handle these demands effectively, but maximizing their utilization requires careful management. This article explores strategies for managing AI workloads in Kubernetes and OpenShift with GPUs, focusing on features like MIG (Multi-Instance GPU), time slicing, MPS (Multi-Process Service), and vGPU (Virtual GPU). Practical examples are included to make these concepts approachable and actionable. 1. Why GPUs for AI Workloads? GPUs are ideal for AI workloads due to their massive parallelism and ability to perform complex computations faster than CPUs. However, these resources are expensive, so efficient utilization is crucial. Modern GPUs like NVIDIA H100/H200 come with features like: MIG (Multi-Instance GPU): Partitioning a single GPU into smaller instances. Time slicing: Efficiently sharing GPU res...