Hazem Awadallah

Senior Systems Engineer, KDI SSD Product Engineering · Kingston Technology · California

[email protected] · +1 (714) 350-3482 · LinkedIn · GitHub · MLCommons · Writing

Summary

Senior Systems Engineer at Kingston Technology. 13 years in enterprise storage; the last 2 years building AI inference systems. Authored the MLCommons MLPerf Storage v3.0 KV Cache Benchmark (PR #270, March 2026). Built and operate Kingston's on-prem LLM platform, fine-tuned a 20B-parameter model, and built, designed, and automated SSD validation test suites for Kingston Channel SSDs and tier-1 hyperscaler partners.

Selected Work

Experience

Senior Systems Engineer, KDI SSD Product Engineering

Kingston Technology · Storage for AI & LLM Inference · Fountain Valley, CA · July 2019 – Present
  • Authored the MLPerf Storage v3.0 KV Cache Benchmark in the MLCommons TF_KVCache working group. PR #270 merged to main in March 2026.
  • On H100 NVL with vLLM 0.18 and Qwen2.5-72B FP8, NVMe KV offload lifted serving throughput 71–82% under 128-concurrency, 12K-context stress.
  • Published Kingston's PCIe Gen5 NVMe for AI whitepaper. One Gen5 SSD sustained 11.6 GB/s feeding 127 A100s at 91.6% accelerator utilization on ResNet-50.
  • Led Kingston's first MLPerf Storage v2 CLOSED submission across Dell Gen4, Lenovo Gen5, and Supermicro Gen5 platforms.
  • Built and operate Nebula, an on-prem LLM platform on 8 GPUs (6×A40 + 2×A100) serving 53 engineers across three continents.
  • Architected end-to-end AI infrastructure on Nebula: RAG pipeline tuned for engineering accuracy, custom evaluation framework, and agent codebases for code review across 10 languages with verified auto-fix loops, content generation, QoS reporting, and semiconductor intelligence.
  • LoRA fine-tuned domain-specific KDI Spec Expert model variants scoring 4.88–4.9/5.0 against o1-mini and Claude Sonnet 4 on 23 expert NVMe questions, with zero technical hallucinations on SSD specification reasoning.
  • Built Nebula's production observability stack (Loki, Promtail, Tempo, OpenTelemetry, Grafana) with Okta SSO, CrowdStrike, and Fail2ban security integration.
  • Delivered enterprise-grade AI — including domain-specific fine-tunes unavailable commercially — at $330/month operating cost, versus $9K+/month for equivalent commercial alternatives.
  • Developed 40+ custom tools and a 227-prompt library across 19 categories; managed automated analytics, backup, and disaster-recovery infrastructure for the platform.
  • Led technical knowledge transfer and documentation as the platform transitioned toward formal IT governance.
  • Built an internal storage and memory benchmark for local client-based AI systems (~23,500 LOC, 18 workloads, dual-platform Windows / Linux).
  • Designed, built, and automated Kingston's enterprise SSD validation ecosystem: 50+ automated test suites covering RAID certification, VMware vSAN, NVMe protocol conformance, thermal validation, power-loss protection, JEDEC 219a endurance, and SNIA Enterprise PTS — the foundation of Kingston's ability to ship datacenter SSDs.
  • Built the QoS Cloud benchmarking suite: 180+ real-workload scenarios stressing IOPS, bandwidth, and P99.99 / P99.999 tail latency across the DC SSD product line, with SQL backend and dashboard reporting for tier-1 customer engagements.
  • Built the AWS-partner SSD qualification framework (Python + Ansible + Flask UI). Single-command qualification across hundreds of hosts.
  • Resolved 32-second write-latency tail-stalls across an 8,000-device European hyperscaler fleet via blktrace replay; firmware fix dropped P99.9999 from 3060ms to 164ms.
  • Designed Kingston's Quarch power-loss qualification suite. Discovered the CC.SHN graceful-shutdown sequence proving graceful vs ungraceful via the SMART unsafe-shutdowns counter.
  • Established Broadcom 94XX/95XX RAID/HBA qualification path and Red Hat RHEL 8 certification for 20+ Kingston datacenter SSDs.

Senior Systems Engineer II, Storage Architecture

Geico · Chevy Chase, MD · September 2016 – July 2019
  • Co-owned 50PB of distributed storage across US datacenters on a six-engineer team backing all of Geico's financial apps, customer databases, and compliance systems.
  • Led 2,600+ volume migration across IBM SVC with zero downtime under 99.99% uptime contracts; multi-month cutover against live financial operations.
  • Cut manual storage operations 80% via Python + Bash automation: NPIV zone configuration across dual SAN fabrics, volume mirroring, fleet-scale provisioning and decommission.
  • Primary escalation engineer across IBM SVC / FlashSystem / A9000R / XIV, EMC VMAX, HP 3PAR 8200, Brocade DCX; ran zero-data-loss recovery on complete backend array outages.
  • Owned cross-regional datacenter migrations and DR architecture on Site Recovery Manager for mission-critical financial workloads.

Systems Engineer, IBM GPS

IBM · Supporting US Fortune 500 enterprise clients · December 2012 – March 2016
  • Level-2 escalation engineer for US Fortune 500 storage outages across DS series, Storwize, SVC, FlashSystem, XIV, EMC VMAX, HP 3PAR 8200, Brocade DCX and Cisco MDS SAN; multipathing diagnostics across AIX (SDD/SDDPCM) and Windows (SDDDSM) under outage pressure.
  • Trained at IBM Dallas on SVC/Storwize/SAN/AIX and IBM San Jose on Tivoli Productivity Center; authored an internal SAN troubleshooting wiki and mentored new engineers.

Publications

Projects

Skills

AI & Inference

  • vLLM, llm-d, KV cache offloading, prefix caching, paged attention
  • LoRA / QLoRA fine-tuning (Unsloth), gpt-oss family, GGUF, custom Modelfiles
  • Llama 3, Qwen 2.5/3 (incl. 3-VL, 3-coder-next), GPT-OSS MoE, DeepSeek-V3 MLA, Mistral, GLM, Devstral
  • Native function / tool calling, multi-step ReAct workflows, auto-agent loops, agent harness design
  • RAG: Apache Tika, BGE-M3, ChromaDB, vLLM rerankers (bge-reranker-v2-m3), hybrid search
  • Multi-modal: vision (Gemini 3 Pro, Qwen3-VL, devstral-vision), image gen (Flux, Nano Banana), Kokoro TTS
  • Ollama (incl. Ollama API integrations), Open WebUI (incl. Open WebUI API integrations and upstream contributor PR #18095), FastAPI; Vertex AI Model Garden, Zero-Data Retention
  • Custom tool / MCP development against Ollama and Open WebUI (schema-coerced JSON args, parallel dispatch, ReAct loops with budgeted retries)
  • MLCommons MLPerf Storage v2 / v3 (TF_KVCache working group member, author of v3 KV Cache benchmark)
  • Inference performance optimization for concurrency and scaling: KV-cache budgeting, context-window sizing, GPU memory partitioning, request-routing, NVMe-tiered KV offload

Storage & Systems

  • NVMe internals (1.4+), SR-IOV, FDP, NVMe-oF, TCG Opal 2.0, JEDEC 219a, SNIA Enterprise PTS
  • VMware vSAN / vSphere / ESXi, Proxmox KVM, VFIO passthrough, Truenas Scale / ZFS, Ceph
  • IBM SVC / Storwize / FlashSystem / VMAX / 3PAR / NetApp; Brocade DCX, Cisco MDS, multipathing
  • OEM AVL & qualification: Broadcom, VMware, Red Hat, Phison, SMI, Memblaze, Dell

Performance & Tooling

  • blktrace, bpftrace, fio, and Linux performance telemetry tools (iostat, vmstat, nmon, perf)
  • AI observability: Loki, Promtail, Tempo, OpenTelemetry, Grafana
  • D2C / Q2D latency decomposition, P99.99 / P99.999 tail-latency analytics
  • Quarch QTL1431 / QTL2266 / Torridon power-loss, CC.SHN graceful shutdown
  • Workload distillation: trace → bssplit → iodepth → thinktime fio replay (distill_fio.py)
  • TC-08 thermocouples, normalized T1/T2 thermal analysis, IPMI / Racadm telemetry
  • SPEC-style geomean benchmarking, real-time disk-I/O monitoring, cache-leak detection

Programming

  • Python (production): MLPerf v3 KV Cache benchmark (13 modules, 4013-line test suite); internal storage/memory client-AI benchmark (~23,500 LOC); Nebula tool-calling harness (~1000 LOC, no LangChain); AWS validation framework; pandas-based blktrace analysis
  • Python frameworks: FastAPI, Flask, Plotly/Dash, pytest, Ansible, PyInstaller cross-platform binaries
  • Bash (deep) — system-level scripting, automation, and validation orchestration across thousands of hosts; PowerShell, SQL; eBPF (bpftrace)
  • Open-source contributions: Open WebUI PR #18095; MLCommons MLPerf Storage v3 KV Cache Benchmark (author)

Cloud & Platforms

  • AWS (tier-1 SSD validation partner), Microsoft Azure, Google Vertex AI Model Garden, MinIO S3
  • Docker, Git, GitHub Actions, Azure DevOps pipelines
  • Linux KVM, Proxmox, VFIO passthrough; Ollama, Open WebUI, vLLM serving

Education

BSc, Communication Engineering

German University in Cairo · 2006 – 2011