Hazem Awadallah

Senior Systems Engineer, KDI SSD Product Engineering · Kingston Technology · California

[email protected] · +1 (714) 350-3482 · LinkedIn · GitHub · MLCommons · Writing

Summary

Senior Systems Engineer at Kingston Technology. 13 years in enterprise storage; the last 2 years building AI inference systems. Authored the MLCommons MLPerf Storage v3.0 KV Cache Benchmark (PR #270, March 2026). Built and operate Kingston's on-prem LLM platform, fine-tuned a 20B-parameter model, and built, designed, and automated SSD validation test suites for Kingston Channel SSDs and tier-1 hyperscaler partners.

Selected Work

MLPerf Storage v3.0 KV Cache Benchmark MLCommons TF_KVCache working group, PR #270 merged March 2026. github.com/mlcommons/storage
Extending LLM Inference Capacity with NVMe Storage Kingston whitepaper, in review. H100 + vLLM 0.18 + Qwen2.5-72B FP8; NVMe KV offload lifted throughput 71–82% under 128-concurrency stress.
PCIe Gen5 NVMe for AI Published Kingston whitepaper, 2025. One Gen5 SSD sustained 11.6 GB/s feeding 127 A100s at 91.6% accelerator utilization on ResNet-50. media.kingston.com
Nebula — Kingston's on-prem LLM platform Open WebUI + Ollama on 8 GPUs serving 53 engineers across three continents. RAG pipeline, 40+ custom tools, Okta SSO + RBAC, Loki / Promtail / Tempo / OpenTelemetry / Grafana observability. $330/month vs $9K+/month commercial equivalent.
Storage & memory benchmark for local client-based AI systems Internal Kingston tool, unreleased. ~23,500 production LOC, 18 workloads, dual-platform Windows / Linux.
KDI Spec Expert — LoRA fine-tunes for SSD spec reasoning Domain-specific model variants on 135,517 internal documents. Scored 4.88–4.9/5.0 against o1-mini and Claude Sonnet 4 on 23 expert NVMe questions with zero technical hallucinations.

Experience

Senior Systems Engineer, KDI SSD Product Engineering

Kingston Technology · Storage for AI & LLM Inference · Fountain Valley, CA · July 2019 – Present

Authored the MLPerf Storage v3.0 KV Cache Benchmark in the MLCommons TF_KVCache working group. PR #270 merged to main in March 2026.
On H100 NVL with vLLM 0.18 and Qwen2.5-72B FP8, NVMe KV offload lifted serving throughput 71–82% under 128-concurrency, 12K-context stress.
Published Kingston's PCIe Gen5 NVMe for AI whitepaper. One Gen5 SSD sustained 11.6 GB/s feeding 127 A100s at 91.6% accelerator utilization on ResNet-50.
Led Kingston's first MLPerf Storage v2 CLOSED submission across Dell Gen4, Lenovo Gen5, and Supermicro Gen5 platforms.
Built and operate Nebula, an on-prem LLM platform on 8 GPUs (6×A40 + 2×A100) serving 53 engineers across three continents.
Architected end-to-end AI infrastructure on Nebula: RAG pipeline tuned for engineering accuracy, custom evaluation framework, and agent codebases for code review across 10 languages with verified auto-fix loops, content generation, QoS reporting, and semiconductor intelligence.
LoRA fine-tuned domain-specific KDI Spec Expert model variants scoring 4.88–4.9/5.0 against o1-mini and Claude Sonnet 4 on 23 expert NVMe questions, with zero technical hallucinations on SSD specification reasoning.
Built Nebula's production observability stack (Loki, Promtail, Tempo, OpenTelemetry, Grafana) with Okta SSO, CrowdStrike, and Fail2ban security integration.
Delivered enterprise-grade AI — including domain-specific fine-tunes unavailable commercially — at $330/month operating cost, versus $9K+/month for equivalent commercial alternatives.
Developed 40+ custom tools and a 227-prompt library across 19 categories; managed automated analytics, backup, and disaster-recovery infrastructure for the platform.
Led technical knowledge transfer and documentation as the platform transitioned toward formal IT governance.
Built an internal storage and memory benchmark for local client-based AI systems (~23,500 LOC, 18 workloads, dual-platform Windows / Linux).
Designed, built, and automated Kingston's enterprise SSD validation ecosystem: 50+ automated test suites covering RAID certification, VMware vSAN, NVMe protocol conformance, thermal validation, power-loss protection, JEDEC 219a endurance, and SNIA Enterprise PTS — the foundation of Kingston's ability to ship datacenter SSDs.
Built the QoS Cloud benchmarking suite: 180+ real-workload scenarios stressing IOPS, bandwidth, and P99.99 / P99.999 tail latency across the DC SSD product line, with SQL backend and dashboard reporting for tier-1 customer engagements.
Built the AWS-partner SSD qualification framework (Python + Ansible + Flask UI). Single-command qualification across hundreds of hosts.
Resolved 32-second write-latency tail-stalls across an 8,000-device European hyperscaler fleet via blktrace replay; firmware fix dropped P99.9999 from 3060ms to 164ms.
Designed Kingston's Quarch power-loss qualification suite. Discovered the CC.SHN graceful-shutdown sequence proving graceful vs ungraceful via the SMART unsafe-shutdowns counter.
Established Broadcom 94XX/95XX RAID/HBA qualification path and Red Hat RHEL 8 certification for 20+ Kingston datacenter SSDs.

Senior Systems Engineer II, Storage Architecture

Geico · Chevy Chase, MD · September 2016 – July 2019

Co-owned 50PB of distributed storage across US datacenters on a six-engineer team backing all of Geico's financial apps, customer databases, and compliance systems.
Led 2,600+ volume migration across IBM SVC with zero downtime under 99.99% uptime contracts; multi-month cutover against live financial operations.
Cut manual storage operations 80% via Python + Bash automation: NPIV zone configuration across dual SAN fabrics, volume mirroring, fleet-scale provisioning and decommission.
Primary escalation engineer across IBM SVC / FlashSystem / A9000R / XIV, EMC VMAX, HP 3PAR 8200, Brocade DCX; ran zero-data-loss recovery on complete backend array outages.
Owned cross-regional datacenter migrations and DR architecture on Site Recovery Manager for mission-critical financial workloads.

Systems Engineer, IBM GPS

IBM · Supporting US Fortune 500 enterprise clients · December 2012 – March 2016

Level-2 escalation engineer for US Fortune 500 storage outages across DS series, Storwize, SVC, FlashSystem, XIV, EMC VMAX, HP 3PAR 8200, Brocade DCX and Cisco MDS SAN; multipathing diagnostics across AIX (SDD/SDDPCM) and Windows (SDDDSM) under outage pressure.
Trained at IBM Dallas on SVC/Storwize/SAN/AIX and IBM San Jose on Tivoli Productivity Center; authored an internal SAN troubleshooting wiki and mentored new engineers.

Publications

2026MLPerf Storage v3.0 KV Cache Benchmark. MLCommons (PR #270, March 2026). github.com/mlcommons/storage
2026Extending LLM Inference Capacity with NVMe Storage. Kingston Technology. In review.
2025PCIe Gen5 NVMe for AI: MLPerf Storage v2 Findings on the DC3000ME. Kingston Technology. media.kingston.com
2025Kingston DC3000ME MLPerf Storage v2.0 Submission. MLCommons CLOSED division. official results
2022DC600M SSD Efficiency with VMware vSAN. Kingston Technology. kingston.com
2020Resilient, Responsive Databases: Kingston DC1500M Enterprise NVMe SSD. Kingston Technology. kingston.com
2011A Compact Symmetric U-Shaped Monopole for Ultra-Wide-Band Operation. IEEE NSRC 2011. IEEE Xplore
2010A Compact Symmetric Branched-Chain Monopole for Dual Wide-Band Operation. IEEE MMS 2010. IEEE Xplore

Projects

Nebula. On-prem LLM platform on 8 GPUs serving 53 engineers across three continents. RAG pipeline, 40+ custom tools and a 227-prompt library across 19 categories, Okta SSO + RBAC, Loki / Promtail / Tempo / OpenTelemetry / Grafana observability. $330/month operating cost vs $9K+/month commercial equivalent.
KDI Spec Expert. LoRA-fine-tuned model variants on 135,517 internal documents. Scored 4.88–4.9/5.0 against o1-mini and Claude Sonnet 4 on 23 expert NVMe questions, zero technical hallucinations.
Storage & memory benchmark for local client-based AI systems. Internal Kingston tool, unreleased. ~23,500 LOC, 18 workloads, dual-platform Windows / Linux.
nebula-chat. Portable Win + Linux CLI agent against Ollama with 29 built-in tools and a multi-step auto-agent mode.
Annexia escalation. Diagnosed and resolved 32-second tail-latency on an 8,000-device hyperscaler fleet. P99.9999 dropped to 164ms.
Dell-OEM Quarch power-loss qualification. Six scripts, five-phase suite. Documented the HBA-vs-direct-PCIe gap and the CC.SHN graceful-shutdown sequence.
fdp-flash-gui. Interactive NVMe FDP simulator. github
rag-ollama. Local-only RAG framework, on-prem only. github

Skills

AI & Inference

vLLM, llm-d, KV cache offloading, prefix caching, paged attention
LoRA / QLoRA fine-tuning (Unsloth), gpt-oss family, GGUF, custom Modelfiles
Llama 3, Qwen 2.5/3 (incl. 3-VL, 3-coder-next), GPT-OSS MoE, DeepSeek-V3 MLA, Mistral, GLM, Devstral
Native function / tool calling, multi-step ReAct workflows, auto-agent loops, agent harness design
RAG: Apache Tika, BGE-M3, ChromaDB, vLLM rerankers (bge-reranker-v2-m3), hybrid search
Multi-modal: vision (Gemini 3 Pro, Qwen3-VL, devstral-vision), image gen (Flux, Nano Banana), Kokoro TTS
Ollama (incl. Ollama API integrations), Open WebUI (incl. Open WebUI API integrations and upstream contributor PR #18095), FastAPI; Vertex AI Model Garden, Zero-Data Retention
Custom tool / MCP development against Ollama and Open WebUI (schema-coerced JSON args, parallel dispatch, ReAct loops with budgeted retries)
MLCommons MLPerf Storage v2 / v3 (TF_KVCache working group member, author of v3 KV Cache benchmark)
Inference performance optimization for concurrency and scaling: KV-cache budgeting, context-window sizing, GPU memory partitioning, request-routing, NVMe-tiered KV offload

Storage & Systems

NVMe internals (1.4+), SR-IOV, FDP, NVMe-oF, TCG Opal 2.0, JEDEC 219a, SNIA Enterprise PTS
VMware vSAN / vSphere / ESXi, Proxmox KVM, VFIO passthrough, Truenas Scale / ZFS, Ceph
IBM SVC / Storwize / FlashSystem / VMAX / 3PAR / NetApp; Brocade DCX, Cisco MDS, multipathing
OEM AVL & qualification: Broadcom, VMware, Red Hat, Phison, SMI, Memblaze, Dell

Performance & Tooling

blktrace, bpftrace, fio, and Linux performance telemetry tools (iostat, vmstat, nmon, perf)
AI observability: Loki, Promtail, Tempo, OpenTelemetry, Grafana
D2C / Q2D latency decomposition, P99.99 / P99.999 tail-latency analytics
Quarch QTL1431 / QTL2266 / Torridon power-loss, CC.SHN graceful shutdown
Workload distillation: trace → bssplit → iodepth → thinktime fio replay (distill_fio.py)
TC-08 thermocouples, normalized T1/T2 thermal analysis, IPMI / Racadm telemetry
SPEC-style geomean benchmarking, real-time disk-I/O monitoring, cache-leak detection

Programming

Python (production): MLPerf v3 KV Cache benchmark (13 modules, 4013-line test suite); internal storage/memory client-AI benchmark (~23,500 LOC); Nebula tool-calling harness (~1000 LOC, no LangChain); AWS validation framework; pandas-based blktrace analysis
Python frameworks: FastAPI, Flask, Plotly/Dash, pytest, Ansible, PyInstaller cross-platform binaries
Bash (deep) — system-level scripting, automation, and validation orchestration across thousands of hosts; PowerShell, SQL; eBPF (bpftrace)
Open-source contributions: Open WebUI PR #18095; MLCommons MLPerf Storage v3 KV Cache Benchmark (author)

Cloud & Platforms

AWS (tier-1 SSD validation partner), Microsoft Azure, Google Vertex AI Model Garden, MinIO S3
Docker, Git, GitHub Actions, Azure DevOps pipelines
Linux KVM, Proxmox, VFIO passthrough; Ollama, Open WebUI, vLLM serving

Education

BSc, Communication Engineering

German University in Cairo · 2006 – 2011