[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"kb-article-red-hat-s-llm-d-joins-cncf-kubernetes-native-llm-inference-at-scale-en":3,"ArticleBody_sZ31yyqQcGFZR1CNkSnSnKzb1kPr5rspwIGncimc":89},{"article":4,"relatedArticles":58,"locale":48},{"id":5,"title":6,"slug":7,"content":8,"htmlContent":9,"excerpt":10,"category":11,"tags":12,"metaDescription":10,"wordCount":13,"readingTime":14,"publishedAt":15,"sources":16,"sourceCoverage":42,"transparency":43,"seo":47,"language":48,"featuredImage":49,"featuredImageCredit":50,"isFreeGeneration":54,"trendSlug":42,"trendSnapshot":42,"niche":55,"geoTakeaways":42,"geoFaq":42,"entities":42},"69cc1b240e6c02b7816bdd82","Red Hat’s llm-d Joins CNCF: Kubernetes-Native LLM Inference at Scale","red-hat-s-llm-d-joins-cncf-kubernetes-native-llm-inference-at-scale","Red Hat’s contribution of llm-d to the CNCF Sandbox makes Kubernetes a first-class platform for LLM inference, not just a “good enough” runtime.[1]  \n\nBy treating accelerators, topology, and KV cache as programmable resources, llm-d turns existing Kubernetes clusters into shared AI fabrics instead of isolated inference stacks.[4][7]\n\n💡 **Key idea:** llm-d makes LLM inference a cloud native workload governed by open standards and CNCF processes, not vendor-specific systems.[1]\n\n---\n\n## 1. Why llm-d Matters for Kubernetes and CNCF\n\nllm-d’s CNCF Sandbox status anchors LLM inference in neutral, open governance similar to Kubernetes itself.[1]\n\n- Ensures APIs, patterns, and scheduling semantics evolve under Linux Foundation stewardship.  \n- Reduces lock-in risk versus proprietary inference platforms.\n\nThe project’s origins highlight broad neutrality:\n\n- Launched in May 2025 by Red Hat, Google Cloud, IBM Research, CoreWeave, and NVIDIA.[1]  \n- Expanded to AMD, Cisco, Hugging Face, Intel, Lambda, Mistral AI, and universities.[1][10]  \n- Signals alignment on a shared, Kubernetes-native inference approach.\n\n⚡ **Strategic shift:** Designed for “any model, any accelerator, any cloud,” targeting heterogeneous, multi-cloud clusters with GPUs, TPUs, and custom ASICs.[1][3]\n\nllm-d is:\n\n- A vehicle to evolve Kubernetes into state-of-the-art AI infrastructure.[1]  \n- Focused on production serving: performance per dollar, multi-tenancy, and SLOs.[7][9]  \n- Aimed at platform\u002FDevOps teams, not just researchers.\n\n💼 **Section takeaway:** With llm-d in CNCF, Kubernetes becomes the default place to standardize LLM serving, scheduling, and optimization across vendors and clouds.\n\n---\n\n## 2. Core Architecture: Distributed Inference Built for Kubernetes\n\nllm-d provides a Kubernetes-native architecture for distributed inference, built on vLLM plus an inference scheduler, cache-aware routing, and disaggregated serving.[2][7] It embeds into Kubernetes rather than replacing it.\n\n### Disaggregated prefill and decode\n\nInference is split into two phases:\n\n- **Prefill:** Compute-heavy, builds KV cache for input tokens.  \n- **Decode:** Memory-bandwidth-bound, consumes KV cache to generate tokens.[8]\n\nllm-d can run these on different replicas and accelerator types, so GPUs are used where they matter instead of over-provisioning every pod.[3][8]\n\n\u003Cdiv class=\"mermaid-diagram not-prose my-6\" role=\"img\" aria-label=\"Diagram\">\n\u003Csvg id=\"diagram-1775215254784\" width=\"100%\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F2000\u002Fsvg\" class=\"flowchart\" style=\"max-width: 1338.734375px;\" viewBox=\"0 0 1338.734375 95\" role=\"graphics-document document\" aria-roledescription=\"flowchart-v2\">\u003Cstyle>#diagram-1775215254784{font-family:system-ui,-apple-system,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#diagram-1775215254784 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#diagram-1775215254784 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#diagram-1775215254784 .error-icon{fill:#552222;}#diagram-1775215254784 .error-text{fill:#552222;stroke:#552222;}#diagram-1775215254784 .edge-thickness-normal{stroke-width:1px;}#diagram-1775215254784 .edge-thickness-thick{stroke-width:3.5px;}#diagram-1775215254784 .edge-pattern-solid{stroke-dasharray:0;}#diagram-1775215254784 .edge-thickness-invisible{stroke-width:0;fill:none;}#diagram-1775215254784 .edge-pattern-dashed{stroke-dasharray:3;}#diagram-1775215254784 .edge-pattern-dotted{stroke-dasharray:2;}#diagram-1775215254784 .marker{fill:#333333;stroke:#333333;}#diagram-1775215254784 .marker.cross{stroke:#333333;}#diagram-1775215254784 svg{font-family:system-ui,-apple-system,sans-serif;font-size:16px;}#diagram-1775215254784 p{margin:0;}#diagram-1775215254784 .label{font-family:system-ui,-apple-system,sans-serif;color:#333;}#diagram-1775215254784 .cluster-label text{fill:#333;}#diagram-1775215254784 .cluster-label span{color:#333;}#diagram-1775215254784 .cluster-label span p{background-color:transparent;}#diagram-1775215254784 .label text,#diagram-1775215254784 span{fill:#333;color:#333;}#diagram-1775215254784 .node rect,#diagram-1775215254784 .node circle,#diagram-1775215254784 .node ellipse,#diagram-1775215254784 .node polygon,#diagram-1775215254784 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#diagram-1775215254784 .rough-node .label text,#diagram-1775215254784 .node .label text,#diagram-1775215254784 .image-shape .label,#diagram-1775215254784 .icon-shape .label{text-anchor:middle;}#diagram-1775215254784 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#diagram-1775215254784 .rough-node .label,#diagram-1775215254784 .node .label,#diagram-1775215254784 .image-shape .label,#diagram-1775215254784 .icon-shape .label{text-align:center;}#diagram-1775215254784 .node.clickable{cursor:pointer;}#diagram-1775215254784 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#diagram-1775215254784 .arrowheadPath{fill:#333333;}#diagram-1775215254784 .edgePath .path{stroke:#333333;stroke-width:1px;}#diagram-1775215254784 .flowchart-link{stroke:#333333;fill:none;}#diagram-1775215254784 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#diagram-1775215254784 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#diagram-1775215254784 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#diagram-1775215254784 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#diagram-1775215254784 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#diagram-1775215254784 .cluster text{fill:#333;}#diagram-1775215254784 .cluster span{color:#333;}#diagram-1775215254784 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:system-ui,-apple-system,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#diagram-1775215254784 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#diagram-1775215254784 rect.text{fill:none;stroke-width:0;}#diagram-1775215254784 .icon-shape,#diagram-1775215254784 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#diagram-1775215254784 .icon-shape p,#diagram-1775215254784 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#diagram-1775215254784 .icon-shape .label rect,#diagram-1775215254784 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#diagram-1775215254784 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#diagram-1775215254784 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#diagram-1775215254784 .node .neo-node{stroke:#9370DB;}#diagram-1775215254784 [data-look=\"neo\"].node rect,#diagram-1775215254784 [data-look=\"neo\"].cluster rect,#diagram-1775215254784 [data-look=\"neo\"].node polygon{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215254784 [data-look=\"neo\"].node path{stroke:#9370DB;stroke-width:1px;}#diagram-1775215254784 [data-look=\"neo\"].node .outer-path{filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215254784 [data-look=\"neo\"].node .neo-line path{stroke:#9370DB;filter:none;}#diagram-1775215254784 [data-look=\"neo\"].node circle{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215254784 [data-look=\"neo\"].node circle .state-start{fill:#000000;}#diagram-1775215254784 [data-look=\"neo\"].icon-shape .icon{fill:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215254784 [data-look=\"neo\"].icon-shape .icon-neo path{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215254784 :root{--mermaid-font-family:system-ui,-apple-system,sans-serif;}\u003C\u002Fstyle>\u003Cg>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-pointEnd\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"5\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"8\" markerHeight=\"8\" orient=\"auto\">\u003Cpath d=\"M 0 0 L 10 5 L 0 10 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-pointStart\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"4.5\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"8\" markerHeight=\"8\" orient=\"auto\">\u003Cpath d=\"M 0 5 L 10 10 L 10 0 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-pointEnd-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 11.5 14\" refX=\"11.5\" refY=\"7\" markerUnits=\"userSpaceOnUse\" markerWidth=\"10.5\" markerHeight=\"14\" orient=\"auto\">\u003Cpath d=\"M 0 0 L 11.5 7 L 0 14 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-pointStart-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 11.5 14\" refX=\"1\" refY=\"7\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11.5\" markerHeight=\"14\" orient=\"auto\">\u003Cpolygon points=\"0,7 11.5,14 11.5,0\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fpolygon>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-circleEnd\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"11\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-circleStart\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"-1\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-circleEnd-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refY=\"5\" refX=\"12.25\" markerUnits=\"userSpaceOnUse\" markerWidth=\"14\" markerHeight=\"14\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-circleStart-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"-2\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"14\" markerHeight=\"14\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-crossEnd\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 11 11\" refX=\"12\" refY=\"5.2\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Cpath d=\"M 1,1 l 9,9 M 10,1 l -9,9\" class=\"arrowMarkerPath\" style=\"stroke-width: 2; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-crossStart\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 11 11\" refX=\"-1\" refY=\"5.2\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Cpath d=\"M 1,1 l 9,9 M 10,1 l -9,9\" class=\"arrowMarkerPath\" style=\"stroke-width: 2; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-crossEnd-margin\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 15 15\" refX=\"17.7\" refY=\"7.5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"12\" markerHeight=\"12\" orient=\"auto\">\u003Cpath d=\"M 1,1 L 14,14 M 1,14 L 14,1\" class=\"arrowMarkerPath\" style=\"stroke-width: 2.5;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-crossStart-margin\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 15 15\" refX=\"-3.5\" refY=\"7.5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"12\" markerHeight=\"12\" orient=\"auto\">\u003Cpath d=\"M 1,1 L 14,14 M 1,14 L 14,1\" class=\"arrowMarkerPath\" style=\"stroke-width: 2.5; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cg class=\"root\">\u003Cg class=\"clusters\">\u003C\u002Fg>\u003Cg class=\"edgePaths\">\u003Cpath d=\"M176.391,35L180.557,35C184.724,35,193.057,35,200.724,35C208.391,35,215.391,35,218.891,35L222.391,35\" id=\"diagram-1775215254784-L_A_B_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_A_B_0\" data-points=\"W3sieCI6MTc2LjM5MDYyNSwieSI6MzV9LHsieCI6MjAxLjM5MDYyNSwieSI6MzV9LHsieCI6MjI2LjM5MDYyNSwieSI6MzV9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215254784_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M427.125,35L431.292,35C435.458,35,443.792,35,451.458,35C459.125,35,466.125,35,469.625,35L473.125,35\" id=\"diagram-1775215254784-L_B_C_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_B_C_0\" data-points=\"W3sieCI6NDI3LjEyNSwieSI6MzV9LHsieCI6NDUyLjEyNSwieSI6MzV9LHsieCI6NDc3LjEyNSwieSI6MzV9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215254784_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M640.016,35L644.182,35C648.349,35,656.682,35,664.349,35C672.016,35,679.016,35,682.516,35L686.016,35\" id=\"diagram-1775215254784-L_C_D_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_C_D_0\" data-points=\"W3sieCI6NjQwLjAxNTYyNSwieSI6MzV9LHsieCI6NjY1LjAxNTYyNSwieSI6MzV9LHsieCI6NjkwLjAxNTYyNSwieSI6MzV9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215254784_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M863.078,35L867.245,35C871.411,35,879.745,35,887.411,35C895.078,35,902.078,35,905.578,35L909.078,35\" id=\"diagram-1775215254784-L_D_E_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_D_E_0\" data-points=\"W3sieCI6ODYzLjA3ODEyNSwieSI6MzV9LHsieCI6ODg4LjA3ODEyNSwieSI6MzV9LHsieCI6OTEzLjA3ODEyNSwieSI6MzV9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215254784_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M1090.016,35L1094.182,35C1098.349,35,1106.682,35,1114.349,35C1122.016,35,1129.016,35,1132.516,35L1136.016,35\" id=\"diagram-1775215254784-L_E_F_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_E_F_0\" data-points=\"W3sieCI6MTA5MC4wMTU2MjUsInkiOjM1fSx7IngiOjExMTUuMDE1NjI1LCJ5IjozNX0seyJ4IjoxMTQwLjAxNTYyNSwieSI6MzV9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215254784_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003C\u002Fg>\u003Cg class=\"edgeLabels\">\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_A_B_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_B_C_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_C_D_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_D_E_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_E_F_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"nodes\">\u003Cg class=\"node default  \" id=\"diagram-1775215254784-flowchart-A-0\" data-look=\"classic\" transform=\"translate(92.1953125, 35)\">\u003Crect class=\"basic label-container\" style=\"\" x=\"-84.1953125\" y=\"-27\" width=\"168.390625\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-54.1953125, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"108.390625\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Client Request\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215254784-flowchart-B-1\" data-look=\"classic\" transform=\"translate(326.7578125, 35)\">\u003Crect class=\"basic label-container\" style=\"\" x=\"-100.3671875\" y=\"-27\" width=\"200.734375\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-70.3671875, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"140.734375\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Inference Gateway\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215254784-flowchart-C-3\" data-look=\"classic\" transform=\"translate(558.5703125, 35)\">\u003Crect class=\"basic label-container\" style=\"fill:#22c55e !important\" x=\"-81.4453125\" y=\"-27\" width=\"162.890625\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"color:#fff !important\" transform=\"translate(-51.4453125, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"102.890625\" height=\"24\">\u003Cdiv style=\"color: rgb(255, 255, 255) !important; display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\">\u003Cspan style=\"color:#fff !important\" class=\"nodeLabel \">\u003Cp>Prefill Servers\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215254784-flowchart-D-5\" data-look=\"classic\" transform=\"translate(776.546875, 35)\">\u003Crect class=\"basic label-container\" style=\"\" x=\"-86.53125\" y=\"-27\" width=\"173.0625\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-56.53125, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"113.0625\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>KV Cache Store\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215254784-flowchart-E-7\" data-look=\"classic\" transform=\"translate(1001.546875, 35)\">\u003Crect class=\"basic label-container\" style=\"fill:#0ea5e9 !important\" x=\"-88.46875\" y=\"-27\" width=\"176.9375\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"color:#fff !important\" transform=\"translate(-58.46875, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"116.9375\" height=\"24\">\u003Cdiv style=\"color: rgb(255, 255, 255) !important; display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\">\u003Cspan style=\"color:#fff !important\" class=\"nodeLabel \">\u003Cp>Decode Servers\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215254784-flowchart-F-9\" data-look=\"classic\" transform=\"translate(1235.375, 35)\">\u003Crect class=\"basic label-container\" style=\"\" x=\"-95.359375\" y=\"-27\" width=\"190.71875\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-65.359375, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"130.71875\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Response Stream\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003Cdefs>\u003Cfilter id=\"diagram-1775215254784-drop-shadow\" height=\"130%\" width=\"130%\">\u003CfeDropShadow dx=\"4\" dy=\"4\" stdDeviation=\"0\" flood-opacity=\"0.06\" flood-color=\"#000000\">\u003C\u002FfeDropShadow>\u003C\u002Ffilter>\u003C\u002Fdefs>\u003Cdefs>\u003Cfilter id=\"diagram-1775215254784-drop-shadow-small\" height=\"150%\" width=\"150%\">\u003CfeDropShadow dx=\"2\" dy=\"2\" stdDeviation=\"0\" flood-opacity=\"0.06\" flood-color=\"#000000\">\u003C\u002FfeDropShadow>\u003C\u002Ffilter>\u003C\u002Fdefs>\u003Ctext x=\"1333.734375\" y=\"90\" text-anchor=\"end\" fill=\"#6b7280\" stroke=\"#ffffff\" stroke-width=\"3\" paint-order=\"stroke\" font-size=\"11\" font-family=\"system-ui, sans-serif\" opacity=\"0.7\">coreprose.com\u003C\u002Ftext>\u003C\u002Fsvg>\n\u003C\u002Fdiv>\n\n📊 **Architecture insight:** Disaggregation replaces “one big GPU per pod” with a tunable pipeline per phase, workload, and accelerator.[3][8]\n\n### Integration with Inference Gateway\n\nllm-d integrates with the Kubernetes Inference Gateway (IGW):[2][7]\n\n- Applications call a stable gateway API.  \n- Platform teams optimize routing, placement, and scaling internally.  \n- Models, policies, and accelerator layouts can change without touching app code.\n\n### Topology-aware scheduling\n\nThe scheduler understands:\n\n- GPU peer-to-peer connectivity  \n- NUMA layout and local memory bandwidth  \n- Network fabrics and cross-node bandwidth[3][10]\n\nUsing this topology, llm-d:\n\n- Routes requests to meet latency SLOs at lowest cost.  \n- Avoids naive balancing by CPU or generic utilization.[3][10]\n\nGuides and Helm recipes provide “well-lit paths” for deploying llm-d across tens or hundreds of nodes, single- or multi-tenant.[9]\n\n⚠️ **Section takeaway:** llm-d makes inference architecture a native Kubernetes concern, combining vLLM, IGW, and topology-aware scheduling into a reproducible stack.\n\n---\n\n## 3. Performance and Cost Optimizations for Enterprise LLMs\n\nllm-d focuses on levers that determine whether LLMs are economically viable at scale.\n\n### KV cache aware routing\n\nKV cache aware routing sends follow-up or similar prompts to cache-warm nodes, avoiding repeated prefill work.[2][7]\n\n- Especially valuable for multi-step prompts, agents, and RAG.  \n- Reduces tail latency and jitter.\n\n\u003Cdiv class=\"mermaid-diagram not-prose my-6\" role=\"img\" aria-label=\"Diagram\">\n\u003Csvg id=\"diagram-1775215255436\" width=\"100%\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F2000\u002Fsvg\" class=\"flowchart\" style=\"max-width: 924.671875px;\" viewBox=\"0 0 924.671875 199\" role=\"graphics-document document\" aria-roledescription=\"flowchart-v2\">\u003Cstyle>#diagram-1775215255436{font-family:system-ui,-apple-system,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#diagram-1775215255436 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#diagram-1775215255436 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#diagram-1775215255436 .error-icon{fill:#552222;}#diagram-1775215255436 .error-text{fill:#552222;stroke:#552222;}#diagram-1775215255436 .edge-thickness-normal{stroke-width:1px;}#diagram-1775215255436 .edge-thickness-thick{stroke-width:3.5px;}#diagram-1775215255436 .edge-pattern-solid{stroke-dasharray:0;}#diagram-1775215255436 .edge-thickness-invisible{stroke-width:0;fill:none;}#diagram-1775215255436 .edge-pattern-dashed{stroke-dasharray:3;}#diagram-1775215255436 .edge-pattern-dotted{stroke-dasharray:2;}#diagram-1775215255436 .marker{fill:#333333;stroke:#333333;}#diagram-1775215255436 .marker.cross{stroke:#333333;}#diagram-1775215255436 svg{font-family:system-ui,-apple-system,sans-serif;font-size:16px;}#diagram-1775215255436 p{margin:0;}#diagram-1775215255436 .label{font-family:system-ui,-apple-system,sans-serif;color:#333;}#diagram-1775215255436 .cluster-label text{fill:#333;}#diagram-1775215255436 .cluster-label span{color:#333;}#diagram-1775215255436 .cluster-label span p{background-color:transparent;}#diagram-1775215255436 .label text,#diagram-1775215255436 span{fill:#333;color:#333;}#diagram-1775215255436 .node rect,#diagram-1775215255436 .node circle,#diagram-1775215255436 .node ellipse,#diagram-1775215255436 .node polygon,#diagram-1775215255436 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#diagram-1775215255436 .rough-node .label text,#diagram-1775215255436 .node .label text,#diagram-1775215255436 .image-shape .label,#diagram-1775215255436 .icon-shape .label{text-anchor:middle;}#diagram-1775215255436 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#diagram-1775215255436 .rough-node .label,#diagram-1775215255436 .node .label,#diagram-1775215255436 .image-shape .label,#diagram-1775215255436 .icon-shape .label{text-align:center;}#diagram-1775215255436 .node.clickable{cursor:pointer;}#diagram-1775215255436 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#diagram-1775215255436 .arrowheadPath{fill:#333333;}#diagram-1775215255436 .edgePath .path{stroke:#333333;stroke-width:1px;}#diagram-1775215255436 .flowchart-link{stroke:#333333;fill:none;}#diagram-1775215255436 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#diagram-1775215255436 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#diagram-1775215255436 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#diagram-1775215255436 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#diagram-1775215255436 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#diagram-1775215255436 .cluster text{fill:#333;}#diagram-1775215255436 .cluster span{color:#333;}#diagram-1775215255436 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:system-ui,-apple-system,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#diagram-1775215255436 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#diagram-1775215255436 rect.text{fill:none;stroke-width:0;}#diagram-1775215255436 .icon-shape,#diagram-1775215255436 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#diagram-1775215255436 .icon-shape p,#diagram-1775215255436 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#diagram-1775215255436 .icon-shape .label rect,#diagram-1775215255436 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#diagram-1775215255436 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#diagram-1775215255436 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#diagram-1775215255436 .node .neo-node{stroke:#9370DB;}#diagram-1775215255436 [data-look=\"neo\"].node rect,#diagram-1775215255436 [data-look=\"neo\"].cluster rect,#diagram-1775215255436 [data-look=\"neo\"].node polygon{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215255436 [data-look=\"neo\"].node path{stroke:#9370DB;stroke-width:1px;}#diagram-1775215255436 [data-look=\"neo\"].node .outer-path{filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215255436 [data-look=\"neo\"].node .neo-line path{stroke:#9370DB;filter:none;}#diagram-1775215255436 [data-look=\"neo\"].node circle{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215255436 [data-look=\"neo\"].node circle .state-start{fill:#000000;}#diagram-1775215255436 [data-look=\"neo\"].icon-shape .icon{fill:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215255436 [data-look=\"neo\"].icon-shape .icon-neo path{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215255436 :root{--mermaid-font-family:system-ui,-apple-system,sans-serif;}\u003C\u002Fstyle>\u003Cg>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-pointEnd\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"5\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"8\" markerHeight=\"8\" orient=\"auto\">\u003Cpath d=\"M 0 0 L 10 5 L 0 10 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-pointStart\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"4.5\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"8\" markerHeight=\"8\" orient=\"auto\">\u003Cpath d=\"M 0 5 L 10 10 L 10 0 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-pointEnd-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 11.5 14\" refX=\"11.5\" refY=\"7\" markerUnits=\"userSpaceOnUse\" markerWidth=\"10.5\" markerHeight=\"14\" orient=\"auto\">\u003Cpath d=\"M 0 0 L 11.5 7 L 0 14 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-pointStart-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 11.5 14\" refX=\"1\" refY=\"7\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11.5\" markerHeight=\"14\" orient=\"auto\">\u003Cpolygon points=\"0,7 11.5,14 11.5,0\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fpolygon>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-circleEnd\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"11\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-circleStart\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"-1\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-circleEnd-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refY=\"5\" refX=\"12.25\" markerUnits=\"userSpaceOnUse\" markerWidth=\"14\" markerHeight=\"14\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-circleStart-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"-2\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"14\" markerHeight=\"14\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-crossEnd\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 11 11\" refX=\"12\" refY=\"5.2\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Cpath d=\"M 1,1 l 9,9 M 10,1 l -9,9\" class=\"arrowMarkerPath\" style=\"stroke-width: 2; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-crossStart\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 11 11\" refX=\"-1\" refY=\"5.2\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Cpath d=\"M 1,1 l 9,9 M 10,1 l -9,9\" class=\"arrowMarkerPath\" style=\"stroke-width: 2; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-crossEnd-margin\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 15 15\" refX=\"17.7\" refY=\"7.5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"12\" markerHeight=\"12\" orient=\"auto\">\u003Cpath d=\"M 1,1 L 14,14 M 1,14 L 14,1\" class=\"arrowMarkerPath\" style=\"stroke-width: 2.5;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-crossStart-margin\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 15 15\" refX=\"-3.5\" refY=\"7.5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"12\" markerHeight=\"12\" orient=\"auto\">\u003Cpath d=\"M 1,1 L 14,14 M 1,14 L 14,1\" class=\"arrowMarkerPath\" style=\"stroke-width: 2.5; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cg class=\"root\">\u003Cg class=\"clusters\">\u003C\u002Fg>\u003Cg class=\"edgePaths\">\u003Cpath d=\"M161.828,87L165.995,87C170.161,87,178.495,87,186.161,87C193.828,87,200.828,87,204.328,87L207.828,87\" id=\"diagram-1775215255436-L_A_B_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_A_B_0\" data-points=\"W3sieCI6MTYxLjgyODEyNSwieSI6ODd9LHsieCI6MTg2LjgyODEyNSwieSI6ODd9LHsieCI6MjExLjgyODEyNSwieSI6ODd9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215255436_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M322.248,64.873L332.182,59.894C342.116,54.916,361.984,44.958,377.497,39.979C393.01,35,404.169,35,409.749,35L415.328,35\" id=\"diagram-1775215255436-L_B_C_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_B_C_0\" data-points=\"W3sieCI6MzIyLjI0ODM5NDg2MzU2MzQsInkiOjY0Ljg3MzM5NDg2MzU2MzR9LHsieCI6MzgxLjg1MTU2MjUsInkiOjM1fSx7IngiOjQxOS4zMjgxMjUsInkiOjM1fV0=\" data-look=\"classic\" marker-end=\"url(#diagram-1775215255436_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M322.248,109.127L332.182,114.106C342.116,119.084,361.984,129.042,378.901,134.021C395.818,139,409.784,139,416.767,139L423.75,139\" id=\"diagram-1775215255436-L_B_D_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_B_D_0\" data-points=\"W3sieCI6MzIyLjI0ODM5NDg2MzU2MzQsInkiOjEwOS4xMjY2MDUxMzY0MzY2fSx7IngiOjM4MS44NTE1NjI1LCJ5IjoxMzl9LHsieCI6NDI3Ljc1LCJ5IjoxMzl9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215255436_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M637.406,35L641.573,35C645.74,35,654.073,35,661.74,35C669.406,35,676.406,35,679.906,35L683.406,35\" id=\"diagram-1775215255436-L_C_E_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_C_E_0\" data-points=\"W3sieCI6NjM3LjQwNjI1LCJ5IjozNX0seyJ4Ijo2NjIuNDA2MjUsInkiOjM1fSx7IngiOjY4Ny40MDYyNSwieSI6MzV9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215255436_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M628.984,139L634.555,139C640.125,139,651.266,139,664.668,139C678.07,139,693.734,139,701.566,139L709.398,139\" id=\"diagram-1775215255436-L_D_F_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_D_F_0\" data-points=\"W3sieCI6NjI4Ljk4NDM3NSwieSI6MTM5fSx7IngiOjY2Mi40MDYyNSwieSI6MTM5fSx7IngiOjcxMy4zOTg0Mzc1LCJ5IjoxMzl9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215255436_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003C\u002Fg>\u003Cg class=\"edgeLabels\">\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_A_B_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\" transform=\"translate(381.8515625, 35)\">\u003Cg class=\"label\" data-id=\"L_B_C_0\" transform=\"translate(-12.4765625, -12)\">\u003CforeignObject width=\"24.953125\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003Cp>Yes\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\" transform=\"translate(381.8515625, 139)\">\u003Cg class=\"label\" data-id=\"L_B_D_0\" transform=\"translate(-10.921875, -12)\">\u003CforeignObject width=\"21.84375\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003Cp>No\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_C_E_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_D_F_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"nodes\">\u003Cg class=\"node default  \" id=\"diagram-1775215255436-flowchart-A-0\" data-look=\"classic\" transform=\"translate(84.9140625, 87)\">\u003Crect class=\"basic label-container\" style=\"\" x=\"-76.9140625\" y=\"-27\" width=\"153.828125\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-46.9140625, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"93.828125\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>New Prompt\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215255436-flowchart-B-1\" data-look=\"classic\" transform=\"translate(278.1015625, 87)\">\u003Cpolygon points=\"66.2734375,0 132.546875,-66.2734375 66.2734375,-132.546875 0,-66.2734375\" class=\"label-container\" transform=\"translate(-65.7734375, 66.2734375)\">\u003C\u002Fpolygon>\u003Cg class=\"label\" style=\"\" transform=\"translate(-39.2734375, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"78.546875\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Cache Hit?\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215255436-flowchart-C-3\" data-look=\"classic\" transform=\"translate(528.3671875, 35)\">\u003Crect class=\"basic label-container\" style=\"fill:#22c55e !important\" x=\"-109.0390625\" y=\"-27\" width=\"218.078125\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"color:#fff !important\" transform=\"translate(-79.0390625, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"158.078125\" height=\"24\">\u003Cdiv style=\"color: rgb(255, 255, 255) !important; display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\">\u003Cspan style=\"color:#fff !important\" class=\"nodeLabel \">\u003Cp>Route to Warm Node\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215255436-flowchart-D-5\" data-look=\"classic\" transform=\"translate(528.3671875, 139)\">\u003Crect class=\"basic label-container\" style=\"\" x=\"-100.6171875\" y=\"-27\" width=\"201.234375\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-70.6171875, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"141.234375\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Route to Any Node\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215255436-flowchart-E-7\" data-look=\"classic\" transform=\"translate(802.0390625, 35)\">\u003Crect class=\"basic label-container\" style=\"\" x=\"-114.6328125\" y=\"-27\" width=\"229.265625\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-84.6328125, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"169.265625\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Low Latency Response\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215255436-flowchart-F-9\" data-look=\"classic\" transform=\"translate(802.0390625, 139)\">\u003Crect class=\"basic label-container\" style=\"fill:#f59e0b !important\" x=\"-88.640625\" y=\"-27\" width=\"177.28125\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"color:#fff !important\" transform=\"translate(-58.640625, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"117.28125\" height=\"24\">\u003Cdiv style=\"color: rgb(255, 255, 255) !important; display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\">\u003Cspan style=\"color:#fff !important\" class=\"nodeLabel \">\u003Cp>Prefill + Decode\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003Cdefs>\u003Cfilter id=\"diagram-1775215255436-drop-shadow\" height=\"130%\" width=\"130%\">\u003CfeDropShadow dx=\"4\" dy=\"4\" stdDeviation=\"0\" flood-opacity=\"0.06\" flood-color=\"#000000\">\u003C\u002FfeDropShadow>\u003C\u002Ffilter>\u003C\u002Fdefs>\u003Cdefs>\u003Cfilter id=\"diagram-1775215255436-drop-shadow-small\" height=\"150%\" width=\"150%\">\u003CfeDropShadow dx=\"2\" dy=\"2\" stdDeviation=\"0\" flood-opacity=\"0.06\" flood-color=\"#000000\">\u003C\u002FfeDropShadow>\u003C\u002Ffilter>\u003C\u002Fdefs>\u003Ctext x=\"919.671875\" y=\"194\" text-anchor=\"end\" fill=\"#6b7280\" stroke=\"#ffffff\" stroke-width=\"3\" paint-order=\"stroke\" font-size=\"11\" font-family=\"system-ui, sans-serif\" opacity=\"0.7\">coreprose.com\u003C\u002Ftext>\u003C\u002Fsvg>\n\u003C\u002Fdiv>\n\n📊 **Practical effect:** Users see better latency from cache-warm routing and higher GPU utilization by assigning accelerators to specific pipeline stages instead of cloning full stacks per replica.[2]\n\n### Disaggregated serving and workload-aware scheduling\n\nSeparating prefill and decode lets llm-d:\n\n- Reduce duplicate model state replication.[2][8]  \n- Assign hardware by workload shape (short chat, long-context, large batch).[3][8]  \n- Improve:\n  - **Cost per request** via fewer fully replicated servers.[2][8]  \n  - **Time-to-first-token (TTFT)** with prefill-optimized nodes.  \n  - **Time-per-output-token (TPOT)** via stable decode pipelines.[9]\n\nllm-d is tuned for:\n\n- Long-running multi-step prompts  \n- Retrieval-augmented generation  \n- Agentic workflows[6][7]\n\nThese high-value enterprise patterns stress cache management and scheduling.\n\nVendors like Mistral AI note that next-gen models (e.g., Mixture of Experts) require robust KV cache management and disaggregated serving—exactly llm-d’s focus.[1]\n\n💡 **Section takeaway:** llm-d exposes cache locality and phase-aware scheduling as explicit controls, turning raw accelerator capacity into better latency and lower cost for real workloads.\n\n---\n\n## 4. Multi-Accelerator and Topology-Aware Inference\n\nThe same mechanisms also let llm-d treat heterogeneous hardware as one programmable pool. Modern clusters often mix:\n\n- High-end GPUs for interactive chat  \n- Memory-rich accelerators for long-context reasoning  \n- Custom ASICs\u002FTPUs for batch or offline jobs[3]\n\nllm-d offers:\n\n- A unified recipe and scheduler that understands accelerator classes.  \n- Hardware selection based on workload pattern, not manual guesswork.[3]\n\n### Topology and interconnect awareness\n\nllm-d surfaces interconnect details—from NUMA layouts to network fabrics and GPU peer-to-peer bandwidth—so communication-heavy workloads land where overhead is minimized.[3][10]\n\nExpressed via Kubernetes primitives:\n\n- Node labels\u002Ftaints for accelerator type and topology  \n- Affinity\u002Fanti-affinity and scheduling constraints  \n- Standard observability for monitoring hot paths[3][9]\n\n\u003Cdiv class=\"mermaid-diagram not-prose my-6\" role=\"img\" aria-label=\"Diagram\">\n\u003Csvg id=\"diagram-1775215255995\" width=\"100%\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F2000\u002Fsvg\" class=\"flowchart\" style=\"max-width: 746.40625px;\" viewBox=\"0 0 746.40625 403.359375\" role=\"graphics-document document\" aria-roledescription=\"flowchart-v2\">\u003Cstyle>#diagram-1775215255995{font-family:system-ui,-apple-system,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#diagram-1775215255995 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#diagram-1775215255995 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#diagram-1775215255995 .error-icon{fill:#552222;}#diagram-1775215255995 .error-text{fill:#552222;stroke:#552222;}#diagram-1775215255995 .edge-thickness-normal{stroke-width:1px;}#diagram-1775215255995 .edge-thickness-thick{stroke-width:3.5px;}#diagram-1775215255995 .edge-pattern-solid{stroke-dasharray:0;}#diagram-1775215255995 .edge-thickness-invisible{stroke-width:0;fill:none;}#diagram-1775215255995 .edge-pattern-dashed{stroke-dasharray:3;}#diagram-1775215255995 .edge-pattern-dotted{stroke-dasharray:2;}#diagram-1775215255995 .marker{fill:#333333;stroke:#333333;}#diagram-1775215255995 .marker.cross{stroke:#333333;}#diagram-1775215255995 svg{font-family:system-ui,-apple-system,sans-serif;font-size:16px;}#diagram-1775215255995 p{margin:0;}#diagram-1775215255995 .label{font-family:system-ui,-apple-system,sans-serif;color:#333;}#diagram-1775215255995 .cluster-label text{fill:#333;}#diagram-1775215255995 .cluster-label span{color:#333;}#diagram-1775215255995 .cluster-label span p{background-color:transparent;}#diagram-1775215255995 .label text,#diagram-1775215255995 span{fill:#333;color:#333;}#diagram-1775215255995 .node rect,#diagram-1775215255995 .node circle,#diagram-1775215255995 .node ellipse,#diagram-1775215255995 .node polygon,#diagram-1775215255995 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#diagram-1775215255995 .rough-node .label text,#diagram-1775215255995 .node .label text,#diagram-1775215255995 .image-shape .label,#diagram-1775215255995 .icon-shape .label{text-anchor:middle;}#diagram-1775215255995 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#diagram-1775215255995 .rough-node .label,#diagram-1775215255995 .node .label,#diagram-1775215255995 .image-shape .label,#diagram-1775215255995 .icon-shape .label{text-align:center;}#diagram-1775215255995 .node.clickable{cursor:pointer;}#diagram-1775215255995 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#diagram-1775215255995 .arrowheadPath{fill:#333333;}#diagram-1775215255995 .edgePath .path{stroke:#333333;stroke-width:1px;}#diagram-1775215255995 .flowchart-link{stroke:#333333;fill:none;}#diagram-1775215255995 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#diagram-1775215255995 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#diagram-1775215255995 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#diagram-1775215255995 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#diagram-1775215255995 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#diagram-1775215255995 .cluster text{fill:#333;}#diagram-1775215255995 .cluster span{color:#333;}#diagram-1775215255995 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:system-ui,-apple-system,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#diagram-1775215255995 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#diagram-1775215255995 rect.text{fill:none;stroke-width:0;}#diagram-1775215255995 .icon-shape,#diagram-1775215255995 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#diagram-1775215255995 .icon-shape p,#diagram-1775215255995 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#diagram-1775215255995 .icon-shape .label rect,#diagram-1775215255995 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#diagram-1775215255995 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#diagram-1775215255995 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#diagram-1775215255995 .node .neo-node{stroke:#9370DB;}#diagram-1775215255995 [data-look=\"neo\"].node rect,#diagram-1775215255995 [data-look=\"neo\"].cluster rect,#diagram-1775215255995 [data-look=\"neo\"].node polygon{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215255995 [data-look=\"neo\"].node path{stroke:#9370DB;stroke-width:1px;}#diagram-1775215255995 [data-look=\"neo\"].node .outer-path{filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215255995 [data-look=\"neo\"].node .neo-line path{stroke:#9370DB;filter:none;}#diagram-1775215255995 [data-look=\"neo\"].node circle{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215255995 [data-look=\"neo\"].node circle .state-start{fill:#000000;}#diagram-1775215255995 [data-look=\"neo\"].icon-shape .icon{fill:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215255995 [data-look=\"neo\"].icon-shape .icon-neo path{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215255995 :root{--mermaid-font-family:system-ui,-apple-system,sans-serif;}\u003C\u002Fstyle>\u003Cg>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-pointEnd\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"5\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"8\" markerHeight=\"8\" orient=\"auto\">\u003Cpath d=\"M 0 0 L 10 5 L 0 10 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-pointStart\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"4.5\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"8\" markerHeight=\"8\" orient=\"auto\">\u003Cpath d=\"M 0 5 L 10 10 L 10 0 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-pointEnd-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 11.5 14\" refX=\"11.5\" refY=\"7\" markerUnits=\"userSpaceOnUse\" markerWidth=\"10.5\" markerHeight=\"14\" orient=\"auto\">\u003Cpath d=\"M 0 0 L 11.5 7 L 0 14 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-pointStart-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 11.5 14\" refX=\"1\" refY=\"7\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11.5\" markerHeight=\"14\" orient=\"auto\">\u003Cpolygon points=\"0,7 11.5,14 11.5,0\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fpolygon>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-circleEnd\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"11\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-circleStart\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"-1\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-circleEnd-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refY=\"5\" refX=\"12.25\" markerUnits=\"userSpaceOnUse\" markerWidth=\"14\" markerHeight=\"14\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-circleStart-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"-2\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"14\" markerHeight=\"14\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-crossEnd\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 11 11\" refX=\"12\" refY=\"5.2\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Cpath d=\"M 1,1 l 9,9 M 10,1 l -9,9\" class=\"arrowMarkerPath\" style=\"stroke-width: 2; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-crossStart\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 11 11\" refX=\"-1\" refY=\"5.2\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Cpath d=\"M 1,1 l 9,9 M 10,1 l -9,9\" class=\"arrowMarkerPath\" style=\"stroke-width: 2; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-crossEnd-margin\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 15 15\" refX=\"17.7\" refY=\"7.5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"12\" markerHeight=\"12\" orient=\"auto\">\u003Cpath d=\"M 1,1 L 14,14 M 1,14 L 14,1\" class=\"arrowMarkerPath\" style=\"stroke-width: 2.5;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-crossStart-margin\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 15 15\" refX=\"-3.5\" refY=\"7.5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"12\" markerHeight=\"12\" orient=\"auto\">\u003Cpath d=\"M 1,1 L 14,14 M 1,14 L 14,1\" class=\"arrowMarkerPath\" style=\"stroke-width: 2.5; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cg class=\"root\">\u003Cg class=\"clusters\">\u003C\u002Fg>\u003Cg class=\"edgePaths\">\u003Cpath d=\"M274.461,52.489L246.214,58.241C217.966,63.993,161.471,75.496,133.224,90.214C104.977,104.932,104.977,122.865,104.977,131.831L104.977,140.797\" id=\"diagram-1775215255995-L_A_B_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_A_B_0\" data-points=\"W3sieCI6Mjc0LjQ2MDkzNzUsInkiOjUyLjQ4OTIzMTUyMjI3MTE3fSx7IngiOjEwNC45NzY1NjI1LCJ5Ijo4N30seyJ4IjoxMDQuOTc2NTYyNSwieSI6MTQ0Ljc5Njg3NX1d\" data-look=\"classic\" marker-end=\"url(#diagram-1775215255995_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M360.352,62L360.352,66.167C360.352,70.333,360.352,78.667,360.352,86.333C360.352,94,360.352,101,360.352,104.5L360.352,108\" id=\"diagram-1775215255995-L_A_C_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_A_C_0\" data-points=\"W3sieCI6MzYwLjM1MTU2MjUsInkiOjYyfSx7IngiOjM2MC4zNTE1NjI1LCJ5Ijo4N30seyJ4IjozNjAuMzUxNTYyNSwieSI6MTEyfV0=\" data-look=\"classic\" marker-end=\"url(#diagram-1775215255995_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M446.242,51.651L476.632,57.543C507.021,63.434,567.799,75.217,598.189,89.411C628.578,103.604,628.578,120.208,628.578,128.51L628.578,136.813\" id=\"diagram-1775215255995-L_A_D_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_A_D_0\" data-points=\"W3sieCI6NDQ2LjI0MjE4NzUsInkiOjUxLjY1MTI2ODQ1ODkxNzA3fSx7IngiOjYyOC41NzgxMjUsInkiOjg3fSx7IngiOjYyOC41NzgxMjUsInkiOjE0MC44MTI1fV0=\" data-look=\"classic\" marker-end=\"url(#diagram-1775215255995_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M104.977,233.563L104.977,243.195C104.977,252.828,104.977,272.094,104.977,285.227C104.977,298.359,104.977,305.359,104.977,308.859L104.977,312.359\" id=\"diagram-1775215255995-L_B_E_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_B_E_0\" data-points=\"W3sieCI6MTA0Ljk3NjU2MjUsInkiOjIzMy41NjI1fSx7IngiOjEwNC45NzY1NjI1LCJ5IjoyOTEuMzU5Mzc1fSx7IngiOjEwNC45NzY1NjI1LCJ5IjozMTYuMzU5Mzc1fV0=\" data-look=\"classic\" marker-end=\"url(#diagram-1775215255995_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M360.352,266.359L360.352,270.526C360.352,274.693,360.352,283.026,360.352,290.693C360.352,298.359,360.352,305.359,360.352,308.859L360.352,312.359\" id=\"diagram-1775215255995-L_C_F_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_C_F_0\" data-points=\"W3sieCI6MzYwLjM1MTU2MjUsInkiOjI2Ni4zNTkzNzV9LHsieCI6MzYwLjM1MTU2MjUsInkiOjI5MS4zNTkzNzV9LHsieCI6MzYwLjM1MTU2MjUsInkiOjMxNi4zNTkzNzV9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215255995_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M628.578,237.547L628.578,246.516C628.578,255.484,628.578,273.422,628.578,285.891C628.578,298.359,628.578,305.359,628.578,308.859L628.578,312.359\" id=\"diagram-1775215255995-L_D_G_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_D_G_0\" data-points=\"W3sieCI6NjI4LjU3ODEyNSwieSI6MjM3LjU0Njg3NX0seyJ4Ijo2MjguNTc4MTI1LCJ5IjoyOTEuMzU5Mzc1fSx7IngiOjYyOC41NzgxMjUsInkiOjMxNi4zNTkzNzV9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215255995_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003C\u002Fg>\u003Cg class=\"edgeLabels\">\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_A_B_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_A_C_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_A_D_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_B_E_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_C_F_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_D_G_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"nodes\">\u003Cg class=\"node default  \" id=\"diagram-1775215255995-flowchart-A-0\" data-look=\"classic\" transform=\"translate(360.3515625, 35)\">\u003Crect class=\"basic label-container\" style=\"\" x=\"-85.890625\" y=\"-27\" width=\"171.78125\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-55.890625, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"111.78125\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Workload Type\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215255995-flowchart-B-1\" data-look=\"classic\" transform=\"translate(104.9765625, 189.1796875)\">\u003Cpolygon points=\"44.3828125,0 88.765625,-44.3828125 44.3828125,-88.765625 0,-44.3828125\" class=\"label-container\" transform=\"translate(-43.8828125, 44.3828125)\">\u003C\u002Fpolygon>\u003Cg class=\"label\" style=\"\" transform=\"translate(-17.3828125, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"34.765625\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Chat\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215255995-flowchart-C-3\" data-look=\"classic\" transform=\"translate(360.3515625, 189.1796875)\">\u003Cpolygon points=\"77.1796875,0 154.359375,-77.1796875 77.1796875,-154.359375 0,-77.1796875\" class=\"label-container\" transform=\"translate(-76.6796875, 77.1796875)\">\u003C\u002Fpolygon>\u003Cg class=\"label\" style=\"\" transform=\"translate(-50.1796875, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"100.359375\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Long Context\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215255995-flowchart-D-5\" data-look=\"classic\" transform=\"translate(628.578125, 189.1796875)\">\u003Cpolygon points=\"48.3671875,0 96.734375,-48.3671875 48.3671875,-96.734375 0,-48.3671875\" class=\"label-container\" transform=\"translate(-47.8671875, 48.3671875)\">\u003C\u002Fpolygon>\u003Cg class=\"label\" style=\"\" transform=\"translate(-21.3671875, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"42.734375\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Batch\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215255995-flowchart-E-7\" data-look=\"classic\" transform=\"translate(104.9765625, 343.359375)\">\u003Crect class=\"basic label-container\" style=\"fill:#22c55e !important\" x=\"-96.9765625\" y=\"-27\" width=\"193.953125\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"color:#fff !important\" transform=\"translate(-66.9765625, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"133.953125\" height=\"24\">\u003Cdiv style=\"color: rgb(255, 255, 255) !important; display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\">\u003Cspan style=\"color:#fff !important\" class=\"nodeLabel \">\u003Cp>Low-latency GPUs\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215255995-flowchart-F-9\" data-look=\"classic\" transform=\"translate(360.3515625, 343.359375)\">\u003Crect class=\"basic label-container\" style=\"fill:#0ea5e9 !important\" x=\"-108.3984375\" y=\"-27\" width=\"216.796875\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"color:#fff !important\" transform=\"translate(-78.3984375, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"156.796875\" height=\"24\">\u003Cdiv style=\"color: rgb(255, 255, 255) !important; display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\">\u003Cspan style=\"color:#fff !important\" class=\"nodeLabel \">\u003Cp>High-memory Nodes\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215255995-flowchart-G-11\" data-look=\"classic\" transform=\"translate(628.578125, 343.359375)\">\u003Crect class=\"basic label-container\" style=\"fill:#f59e0b !important\" x=\"-109.828125\" y=\"-27\" width=\"219.65625\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"color:#fff !important\" transform=\"translate(-79.828125, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"159.65625\" height=\"24\">\u003Cdiv style=\"color: rgb(255, 255, 255) !important; display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\">\u003Cspan style=\"color:#fff !important\" class=\"nodeLabel \">\u003Cp>Cost-optimized ASICs\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003Cdefs>\u003Cfilter id=\"diagram-1775215255995-drop-shadow\" height=\"130%\" width=\"130%\">\u003CfeDropShadow dx=\"4\" dy=\"4\" stdDeviation=\"0\" flood-opacity=\"0.06\" flood-color=\"#000000\">\u003C\u002FfeDropShadow>\u003C\u002Ffilter>\u003C\u002Fdefs>\u003Cdefs>\u003Cfilter id=\"diagram-1775215255995-drop-shadow-small\" height=\"150%\" width=\"150%\">\u003CfeDropShadow dx=\"2\" dy=\"2\" stdDeviation=\"0\" flood-opacity=\"0.06\" flood-color=\"#000000\">\u003C\u002FfeDropShadow>\u003C\u002Ffilter>\u003C\u002Fdefs>\u003Ctext x=\"741.40625\" y=\"398.359375\" text-anchor=\"end\" fill=\"#6b7280\" stroke=\"#ffffff\" stroke-width=\"3\" paint-order=\"stroke\" font-size=\"11\" font-family=\"system-ui, sans-serif\" opacity=\"0.7\">coreprose.com\u003C\u002Ftext>\u003C\u002Fsvg>\n\u003C\u002Fdiv>\n\n📊 **Planning aid:** Platform teams get a practical scorecard for mixing accelerators by workload—chat, long-context, batch—rather than guessing hardware purchases and placement.[3]\n\nThis multi-accelerator strategy aligns with industry trends: GPU and CPU vendors back llm-d so their hardware participates in a standardized, open inference stack.[1][10]\n\n⚡ **Section takeaway:** llm-d turns heterogeneous hardware and complex topology into declarative scheduling inputs, enabling portable, vendor-neutral AI fabrics.\n\n---\n\n## 5. Adoption Path: From First Cluster to Production Platform\n\nllm-d pairs advanced capabilities with a realistic adoption path.\n\n### From quickstart to optimized platforms\n\nOfficial guides and Helm charts provide:\n\n- Tested, benchmarked recipes for high-performance deployments.[9]  \n- Requirements: only basic Kubernetes familiarity.  \n- Targets:\n  - Single-model deployments across tens\u002Fhundreds of nodes  \n  - Multi-tenant model-as-a-service platforms sharing deployments[9]\n\nThe “well-lit path” includes curated configs for:\n\n- Intelligent inference scheduling  \n- Prefill\u002Fdecode disaggregation  \n- KV cache aware routing[9]\n\n\u003Cdiv class=\"mermaid-diagram not-prose my-6\" role=\"img\" aria-label=\"Diagram\">\n\u003Csvg id=\"diagram-1775215256593\" width=\"100%\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F2000\u002Fsvg\" class=\"flowchart\" style=\"max-width: 1231.578125px;\" viewBox=\"0 0 1231.578125 95\" role=\"graphics-document document\" aria-roledescription=\"flowchart-v2\">\u003Cstyle>#diagram-1775215256593{font-family:system-ui,-apple-system,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#diagram-1775215256593 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#diagram-1775215256593 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#diagram-1775215256593 .error-icon{fill:#552222;}#diagram-1775215256593 .error-text{fill:#552222;stroke:#552222;}#diagram-1775215256593 .edge-thickness-normal{stroke-width:1px;}#diagram-1775215256593 .edge-thickness-thick{stroke-width:3.5px;}#diagram-1775215256593 .edge-pattern-solid{stroke-dasharray:0;}#diagram-1775215256593 .edge-thickness-invisible{stroke-width:0;fill:none;}#diagram-1775215256593 .edge-pattern-dashed{stroke-dasharray:3;}#diagram-1775215256593 .edge-pattern-dotted{stroke-dasharray:2;}#diagram-1775215256593 .marker{fill:#333333;stroke:#333333;}#diagram-1775215256593 .marker.cross{stroke:#333333;}#diagram-1775215256593 svg{font-family:system-ui,-apple-system,sans-serif;font-size:16px;}#diagram-1775215256593 p{margin:0;}#diagram-1775215256593 .label{font-family:system-ui,-apple-system,sans-serif;color:#333;}#diagram-1775215256593 .cluster-label text{fill:#333;}#diagram-1775215256593 .cluster-label span{color:#333;}#diagram-1775215256593 .cluster-label span p{background-color:transparent;}#diagram-1775215256593 .label text,#diagram-1775215256593 span{fill:#333;color:#333;}#diagram-1775215256593 .node rect,#diagram-1775215256593 .node circle,#diagram-1775215256593 .node ellipse,#diagram-1775215256593 .node polygon,#diagram-1775215256593 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#diagram-1775215256593 .rough-node .label text,#diagram-1775215256593 .node .label text,#diagram-1775215256593 .image-shape .label,#diagram-1775215256593 .icon-shape .label{text-anchor:middle;}#diagram-1775215256593 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#diagram-1775215256593 .rough-node .label,#diagram-1775215256593 .node .label,#diagram-1775215256593 .image-shape .label,#diagram-1775215256593 .icon-shape .label{text-align:center;}#diagram-1775215256593 .node.clickable{cursor:pointer;}#diagram-1775215256593 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#diagram-1775215256593 .arrowheadPath{fill:#333333;}#diagram-1775215256593 .edgePath .path{stroke:#333333;stroke-width:1px;}#diagram-1775215256593 .flowchart-link{stroke:#333333;fill:none;}#diagram-1775215256593 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#diagram-1775215256593 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#diagram-1775215256593 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#diagram-1775215256593 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#diagram-1775215256593 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#diagram-1775215256593 .cluster text{fill:#333;}#diagram-1775215256593 .cluster span{color:#333;}#diagram-1775215256593 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:system-ui,-apple-system,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#diagram-1775215256593 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#diagram-1775215256593 rect.text{fill:none;stroke-width:0;}#diagram-1775215256593 .icon-shape,#diagram-1775215256593 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#diagram-1775215256593 .icon-shape p,#diagram-1775215256593 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#diagram-1775215256593 .icon-shape .label rect,#diagram-1775215256593 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#diagram-1775215256593 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#diagram-1775215256593 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#diagram-1775215256593 .node .neo-node{stroke:#9370DB;}#diagram-1775215256593 [data-look=\"neo\"].node rect,#diagram-1775215256593 [data-look=\"neo\"].cluster rect,#diagram-1775215256593 [data-look=\"neo\"].node polygon{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215256593 [data-look=\"neo\"].node path{stroke:#9370DB;stroke-width:1px;}#diagram-1775215256593 [data-look=\"neo\"].node .outer-path{filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215256593 [data-look=\"neo\"].node .neo-line path{stroke:#9370DB;filter:none;}#diagram-1775215256593 [data-look=\"neo\"].node circle{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215256593 [data-look=\"neo\"].node circle .state-start{fill:#000000;}#diagram-1775215256593 [data-look=\"neo\"].icon-shape .icon{fill:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215256593 [data-look=\"neo\"].icon-shape .icon-neo path{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215256593 :root{--mermaid-font-family:system-ui,-apple-system,sans-serif;}\u003C\u002Fstyle>\u003Cg>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-pointEnd\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"5\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"8\" markerHeight=\"8\" orient=\"auto\">\u003Cpath d=\"M 0 0 L 10 5 L 0 10 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-pointStart\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"4.5\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"8\" markerHeight=\"8\" orient=\"auto\">\u003Cpath d=\"M 0 5 L 10 10 L 10 0 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-pointEnd-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 11.5 14\" refX=\"11.5\" refY=\"7\" markerUnits=\"userSpaceOnUse\" markerWidth=\"10.5\" markerHeight=\"14\" orient=\"auto\">\u003Cpath d=\"M 0 0 L 11.5 7 L 0 14 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-pointStart-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 11.5 14\" refX=\"1\" refY=\"7\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11.5\" markerHeight=\"14\" orient=\"auto\">\u003Cpolygon points=\"0,7 11.5,14 11.5,0\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fpolygon>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-circleEnd\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"11\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-circleStart\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"-1\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-circleEnd-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refY=\"5\" refX=\"12.25\" markerUnits=\"userSpaceOnUse\" markerWidth=\"14\" markerHeight=\"14\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-circleStart-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"-2\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"14\" markerHeight=\"14\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-crossEnd\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 11 11\" refX=\"12\" refY=\"5.2\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Cpath d=\"M 1,1 l 9,9 M 10,1 l -9,9\" class=\"arrowMarkerPath\" style=\"stroke-width: 2; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-crossStart\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 11 11\" refX=\"-1\" refY=\"5.2\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Cpath d=\"M 1,1 l 9,9 M 10,1 l -9,9\" class=\"arrowMarkerPath\" style=\"stroke-width: 2; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-crossEnd-margin\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 15 15\" refX=\"17.7\" refY=\"7.5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"12\" markerHeight=\"12\" orient=\"auto\">\u003Cpath d=\"M 1,1 L 14,14 M 1,14 L 14,1\" class=\"arrowMarkerPath\" style=\"stroke-width: 2.5;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-crossStart-margin\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 15 15\" refX=\"-3.5\" refY=\"7.5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"12\" markerHeight=\"12\" orient=\"auto\">\u003Cpath d=\"M 1,1 L 14,14 M 1,14 L 14,1\" class=\"arrowMarkerPath\" style=\"stroke-width: 2.5; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cg class=\"root\">\u003Cg class=\"clusters\">\u003C\u002Fg>\u003Cg class=\"edgePaths\">\u003Cpath d=\"M202.906,35L207.073,35C211.24,35,219.573,35,227.24,35C234.906,35,241.906,35,245.406,35L248.906,35\" id=\"diagram-1775215256593-L_A_B_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_A_B_0\" data-points=\"W3sieCI6MjAyLjkwNjI1LCJ5IjozNX0seyJ4IjoyMjcuOTA2MjUsInkiOjM1fSx7IngiOjI1Mi45MDYyNSwieSI6MzV9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215256593_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M477.188,35L481.354,35C485.521,35,493.854,35,501.521,35C509.188,35,516.188,35,519.688,35L523.188,35\" id=\"diagram-1775215256593-L_B_C_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_B_C_0\" data-points=\"W3sieCI6NDc3LjE4NzUsInkiOjM1fSx7IngiOjUwMi4xODc1LCJ5IjozNX0seyJ4Ijo1MjcuMTg3NSwieSI6MzV9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215256593_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M729.766,35L733.932,35C738.099,35,746.432,35,754.099,35C761.766,35,768.766,35,772.266,35L775.766,35\" id=\"diagram-1775215256593-L_C_D_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_C_D_0\" data-points=\"W3sieCI6NzI5Ljc2NTYyNSwieSI6MzV9LHsieCI6NzU0Ljc2NTYyNSwieSI6MzV9LHsieCI6Nzc5Ljc2NTYyNSwieSI6MzV9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215256593_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M951.188,35L955.354,35C959.521,35,967.854,35,975.521,35C983.188,35,990.188,35,993.688,35L997.188,35\" id=\"diagram-1775215256593-L_D_E_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_D_E_0\" data-points=\"W3sieCI6OTUxLjE4NzUsInkiOjM1fSx7IngiOjk3Ni4xODc1LCJ5IjozNX0seyJ4IjoxMDAxLjE4NzUsInkiOjM1fV0=\" data-look=\"classic\" marker-end=\"url(#diagram-1775215256593_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003C\u002Fg>\u003Cg class=\"edgeLabels\">\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_A_B_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_B_C_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_C_D_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_D_E_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"nodes\">\u003Cg class=\"node default  \" id=\"diagram-1775215256593-flowchart-A-0\" data-look=\"classic\" transform=\"translate(105.453125, 35)\">\u003Crect class=\"basic label-container\" style=\"fill:#e5e7eb !important\" x=\"-97.453125\" y=\"-27\" width=\"194.90625\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-67.453125, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"134.90625\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Quickstart Cluster\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215256593-flowchart-B-1\" data-look=\"classic\" transform=\"translate(365.046875, 35)\">\u003Crect class=\"basic label-container\" style=\"\" x=\"-112.140625\" y=\"-27\" width=\"224.28125\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-82.140625, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"164.28125\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Intelligent Scheduling\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215256593-flowchart-C-3\" data-look=\"classic\" transform=\"translate(628.4765625, 35)\">\u003Crect class=\"basic label-container\" style=\"\" x=\"-101.2890625\" y=\"-27\" width=\"202.578125\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-71.2890625, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"142.578125\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Prefill\u002FDecode Split\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215256593-flowchart-D-5\" data-look=\"classic\" transform=\"translate(865.4765625, 35)\">\u003Crect class=\"basic label-container\" style=\"\" x=\"-85.7109375\" y=\"-27\" width=\"171.421875\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-55.7109375, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"111.421875\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>KV Cache Tests\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215256593-flowchart-E-7\" data-look=\"classic\" transform=\"translate(1112.3828125, 35)\">\u003Crect class=\"basic label-container\" style=\"fill:#22c55e !important\" x=\"-111.1953125\" y=\"-27\" width=\"222.390625\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"color:#fff !important\" transform=\"translate(-81.1953125, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"162.390625\" height=\"24\">\u003Cdiv style=\"color: rgb(255, 255, 255) !important; display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\">\u003Cspan style=\"color:#fff !important\" class=\"nodeLabel \">\u003Cp>Multi-tenant Platform\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003Cdefs>\u003Cfilter id=\"diagram-1775215256593-drop-shadow\" height=\"130%\" width=\"130%\">\u003CfeDropShadow dx=\"4\" dy=\"4\" stdDeviation=\"0\" flood-opacity=\"0.06\" flood-color=\"#000000\">\u003C\u002FfeDropShadow>\u003C\u002Ffilter>\u003C\u002Fdefs>\u003Cdefs>\u003Cfilter id=\"diagram-1775215256593-drop-shadow-small\" height=\"150%\" width=\"150%\">\u003CfeDropShadow dx=\"2\" dy=\"2\" stdDeviation=\"0\" flood-opacity=\"0.06\" flood-color=\"#000000\">\u003C\u002FfeDropShadow>\u003C\u002Ffilter>\u003C\u002Fdefs>\u003Ctext x=\"1226.578125\" y=\"90\" text-anchor=\"end\" fill=\"#6b7280\" stroke=\"#ffffff\" stroke-width=\"3\" paint-order=\"stroke\" font-size=\"11\" font-family=\"system-ui, sans-serif\" opacity=\"0.7\">coreprose.com\u003C\u002Ftext>\u003C\u002Fsvg>\n\u003C\u002Fdiv>\n\nRed Hat’s guidance helps teams:\n\n- Validate KV cache aware routing.  \n- Measure latency and cost improvements against their own workloads.[7][8]\n\n### Community-driven evolution\n\nCloud Native FM discussions with Red Hat engineers frame llm-d as:\n\n- A practical toolset that strengthens Kubernetes for enterprise LLM inference, not a silver bullet.[2]  \n- A CNCF Sandbox project inviting contributions from operators, vendors, and researchers.[1][7]\n\nThis ensures llm-d tracks rapid shifts in:\n\n- Model architectures  \n- Accelerator types  \n- Workload patterns\n\n💼 **Section takeaway:** With opinionated docs, Helm recipes, and open governance, llm-d offers a low-friction path from first experiment to production-grade, multi-tenant LLM platforms.\n\n---\n\n## Conclusion: Turning Kubernetes into an AI Fabric\n\nBy contributing llm-d to CNCF, Red Hat and partners are defining a Kubernetes-native, vendor-neutral standard for distributed LLM inference across accelerators, topologies, and clouds.[1][3][7]  \n\nPlatform teams can manage GPUs, KV caches, and cluster fabric as programmable resources within the same ecosystem that standardized containers and microservices.\n\n⚡ **Call to action:**  \nPlatform teams should:\n\n- Pilot llm-d using official guides and Helm recipes.[9]  \n- Benchmark KV cache aware routing and disaggregated serving against current stacks.[8]  \n- Engage with the CNCF llm-d community to influence features and roadmap as generative AI evolves.[2]\n\nEarly adopters will help shape—and benefit from—the next generation of cloud native AI infrastructure.","\u003Cp>Red Hat’s contribution of llm-d to the CNCF Sandbox makes Kubernetes a first-class platform for LLM inference, not just a “good enough” runtime.\u003Ca href=\"#source-1\" class=\"citation-link\" title=\"View source [1]\">[1]\u003C\u002Fa>\u003C\u002Fp>\n\u003Cp>By treating accelerators, topology, and KV cache as programmable resources, llm-d turns existing Kubernetes clusters into shared AI fabrics instead of isolated inference stacks.\u003Ca href=\"#source-4\" class=\"citation-link\" title=\"View source [4]\">[4]\u003C\u002Fa>\u003Ca href=\"#source-7\" class=\"citation-link\" title=\"View source [7]\">[7]\u003C\u002Fa>\u003C\u002Fp>\n\u003Cp>💡 \u003Cstrong>Key idea:\u003C\u002Fstrong> llm-d makes LLM inference a cloud native workload governed by open standards and CNCF processes, not vendor-specific systems.\u003Ca href=\"#source-1\" class=\"citation-link\" title=\"View source [1]\">[1]\u003C\u002Fa>\u003C\u002Fp>\n\u003Chr>\n\u003Ch2>1. Why llm-d Matters for Kubernetes and CNCF\u003C\u002Fh2>\n\u003Cp>llm-d’s CNCF Sandbox status anchors LLM inference in neutral, open governance similar to Kubernetes itself.\u003Ca href=\"#source-1\" class=\"citation-link\" title=\"View source [1]\">[1]\u003C\u002Fa>\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Ensures APIs, patterns, and scheduling semantics evolve under Linux Foundation stewardship.\u003C\u002Fli>\n\u003Cli>Reduces lock-in risk versus proprietary inference platforms.\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>The project’s origins highlight broad neutrality:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Launched in May 2025 by Red Hat, Google Cloud, IBM Research, CoreWeave, and NVIDIA.\u003Ca href=\"#source-1\" class=\"citation-link\" title=\"View source [1]\">[1]\u003C\u002Fa>\u003C\u002Fli>\n\u003Cli>Expanded to AMD, Cisco, Hugging Face, Intel, Lambda, Mistral AI, and universities.\u003Ca href=\"#source-1\" class=\"citation-link\" title=\"View source [1]\">[1]\u003C\u002Fa>\u003Ca href=\"#source-10\" class=\"citation-link\" title=\"View source [10]\">[10]\u003C\u002Fa>\u003C\u002Fli>\n\u003Cli>Signals alignment on a shared, Kubernetes-native inference approach.\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>⚡ \u003Cstrong>Strategic shift:\u003C\u002Fstrong> Designed for “any model, any accelerator, any cloud,” targeting heterogeneous, multi-cloud clusters with GPUs, TPUs, and custom ASICs.\u003Ca href=\"#source-1\" class=\"citation-link\" title=\"View source [1]\">[1]\u003C\u002Fa>\u003Ca href=\"#source-3\" class=\"citation-link\" title=\"View source [3]\">[3]\u003C\u002Fa>\u003C\u002Fp>\n\u003Cp>llm-d is:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>A vehicle to evolve Kubernetes into state-of-the-art AI infrastructure.\u003Ca href=\"#source-1\" class=\"citation-link\" title=\"View source [1]\">[1]\u003C\u002Fa>\u003C\u002Fli>\n\u003Cli>Focused on production serving: performance per dollar, multi-tenancy, and SLOs.\u003Ca href=\"#source-7\" class=\"citation-link\" title=\"View source [7]\">[7]\u003C\u002Fa>\u003Ca href=\"#source-9\" class=\"citation-link\" title=\"View source [9]\">[9]\u003C\u002Fa>\u003C\u002Fli>\n\u003Cli>Aimed at platform\u002FDevOps teams, not just researchers.\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>💼 \u003Cstrong>Section takeaway:\u003C\u002Fstrong> With llm-d in CNCF, Kubernetes becomes the default place to standardize LLM serving, scheduling, and optimization across vendors and clouds.\u003C\u002Fp>\n\u003Chr>\n\u003Ch2>2. Core Architecture: Distributed Inference Built for Kubernetes\u003C\u002Fh2>\n\u003Cp>llm-d provides a Kubernetes-native architecture for distributed inference, built on vLLM plus an inference scheduler, cache-aware routing, and disaggregated serving.\u003Ca href=\"#source-2\" class=\"citation-link\" title=\"View source [2]\">[2]\u003C\u002Fa>\u003Ca href=\"#source-7\" class=\"citation-link\" title=\"View source [7]\">[7]\u003C\u002Fa> It embeds into Kubernetes rather than replacing it.\u003C\u002Fp>\n\u003Ch3>Disaggregated prefill and decode\u003C\u002Fh3>\n\u003Cp>Inference is split into two phases:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>\u003Cstrong>Prefill:\u003C\u002Fstrong> Compute-heavy, builds KV cache for input tokens.\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Decode:\u003C\u002Fstrong> Memory-bandwidth-bound, consumes KV cache to generate tokens.\u003Ca href=\"#source-8\" class=\"citation-link\" title=\"View source [8]\">[8]\u003C\u002Fa>\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>llm-d can run these on different replicas and accelerator types, so GPUs are used where they matter instead of over-provisioning every pod.\u003Ca href=\"#source-3\" class=\"citation-link\" title=\"View source [3]\">[3]\u003C\u002Fa>\u003Ca href=\"#source-8\" class=\"citation-link\" title=\"View source [8]\">[8]\u003C\u002Fa>\u003C\u002Fp>\n\u003Cdiv class=\"mermaid-diagram not-prose my-6\" role=\"img\" aria-label=\"Diagram\">\n\u003Csvg id=\"diagram-1775215254784\" width=\"100%\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F2000\u002Fsvg\" class=\"flowchart\" style=\"max-width: 1338.734375px;\" viewBox=\"0 0 1338.734375 95\" role=\"graphics-document document\" aria-roledescription=\"flowchart-v2\">\u003Cstyle>#diagram-1775215254784{font-family:system-ui,-apple-system,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#diagram-1775215254784 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#diagram-1775215254784 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#diagram-1775215254784 .error-icon{fill:#552222;}#diagram-1775215254784 .error-text{fill:#552222;stroke:#552222;}#diagram-1775215254784 .edge-thickness-normal{stroke-width:1px;}#diagram-1775215254784 .edge-thickness-thick{stroke-width:3.5px;}#diagram-1775215254784 .edge-pattern-solid{stroke-dasharray:0;}#diagram-1775215254784 .edge-thickness-invisible{stroke-width:0;fill:none;}#diagram-1775215254784 .edge-pattern-dashed{stroke-dasharray:3;}#diagram-1775215254784 .edge-pattern-dotted{stroke-dasharray:2;}#diagram-1775215254784 .marker{fill:#333333;stroke:#333333;}#diagram-1775215254784 .marker.cross{stroke:#333333;}#diagram-1775215254784 svg{font-family:system-ui,-apple-system,sans-serif;font-size:16px;}#diagram-1775215254784 p{margin:0;}#diagram-1775215254784 .label{font-family:system-ui,-apple-system,sans-serif;color:#333;}#diagram-1775215254784 .cluster-label text{fill:#333;}#diagram-1775215254784 .cluster-label span{color:#333;}#diagram-1775215254784 .cluster-label span p{background-color:transparent;}#diagram-1775215254784 .label text,#diagram-1775215254784 span{fill:#333;color:#333;}#diagram-1775215254784 .node rect,#diagram-1775215254784 .node circle,#diagram-1775215254784 .node ellipse,#diagram-1775215254784 .node polygon,#diagram-1775215254784 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#diagram-1775215254784 .rough-node .label text,#diagram-1775215254784 .node .label text,#diagram-1775215254784 .image-shape .label,#diagram-1775215254784 .icon-shape .label{text-anchor:middle;}#diagram-1775215254784 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#diagram-1775215254784 .rough-node .label,#diagram-1775215254784 .node .label,#diagram-1775215254784 .image-shape .label,#diagram-1775215254784 .icon-shape .label{text-align:center;}#diagram-1775215254784 .node.clickable{cursor:pointer;}#diagram-1775215254784 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#diagram-1775215254784 .arrowheadPath{fill:#333333;}#diagram-1775215254784 .edgePath .path{stroke:#333333;stroke-width:1px;}#diagram-1775215254784 .flowchart-link{stroke:#333333;fill:none;}#diagram-1775215254784 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#diagram-1775215254784 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#diagram-1775215254784 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#diagram-1775215254784 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#diagram-1775215254784 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#diagram-1775215254784 .cluster text{fill:#333;}#diagram-1775215254784 .cluster span{color:#333;}#diagram-1775215254784 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:system-ui,-apple-system,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#diagram-1775215254784 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#diagram-1775215254784 rect.text{fill:none;stroke-width:0;}#diagram-1775215254784 .icon-shape,#diagram-1775215254784 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#diagram-1775215254784 .icon-shape p,#diagram-1775215254784 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#diagram-1775215254784 .icon-shape .label rect,#diagram-1775215254784 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#diagram-1775215254784 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#diagram-1775215254784 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#diagram-1775215254784 .node .neo-node{stroke:#9370DB;}#diagram-1775215254784 [data-look=\"neo\"].node rect,#diagram-1775215254784 [data-look=\"neo\"].cluster rect,#diagram-1775215254784 [data-look=\"neo\"].node polygon{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215254784 [data-look=\"neo\"].node path{stroke:#9370DB;stroke-width:1px;}#diagram-1775215254784 [data-look=\"neo\"].node .outer-path{filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215254784 [data-look=\"neo\"].node .neo-line path{stroke:#9370DB;filter:none;}#diagram-1775215254784 [data-look=\"neo\"].node circle{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215254784 [data-look=\"neo\"].node circle .state-start{fill:#000000;}#diagram-1775215254784 [data-look=\"neo\"].icon-shape .icon{fill:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215254784 [data-look=\"neo\"].icon-shape .icon-neo path{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215254784 :root{--mermaid-font-family:system-ui,-apple-system,sans-serif;}\u003C\u002Fstyle>\u003Cg>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-pointEnd\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"5\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"8\" markerHeight=\"8\" orient=\"auto\">\u003Cpath d=\"M 0 0 L 10 5 L 0 10 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-pointStart\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"4.5\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"8\" markerHeight=\"8\" orient=\"auto\">\u003Cpath d=\"M 0 5 L 10 10 L 10 0 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-pointEnd-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 11.5 14\" refX=\"11.5\" refY=\"7\" markerUnits=\"userSpaceOnUse\" markerWidth=\"10.5\" markerHeight=\"14\" orient=\"auto\">\u003Cpath d=\"M 0 0 L 11.5 7 L 0 14 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-pointStart-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 11.5 14\" refX=\"1\" refY=\"7\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11.5\" markerHeight=\"14\" orient=\"auto\">\u003Cpolygon points=\"0,7 11.5,14 11.5,0\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fpolygon>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-circleEnd\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"11\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-circleStart\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"-1\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-circleEnd-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refY=\"5\" refX=\"12.25\" markerUnits=\"userSpaceOnUse\" markerWidth=\"14\" markerHeight=\"14\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-circleStart-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"-2\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"14\" markerHeight=\"14\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-crossEnd\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 11 11\" refX=\"12\" refY=\"5.2\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Cpath d=\"M 1,1 l 9,9 M 10,1 l -9,9\" class=\"arrowMarkerPath\" style=\"stroke-width: 2; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-crossStart\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 11 11\" refX=\"-1\" refY=\"5.2\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Cpath d=\"M 1,1 l 9,9 M 10,1 l -9,9\" class=\"arrowMarkerPath\" style=\"stroke-width: 2; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-crossEnd-margin\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 15 15\" refX=\"17.7\" refY=\"7.5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"12\" markerHeight=\"12\" orient=\"auto\">\u003Cpath d=\"M 1,1 L 14,14 M 1,14 L 14,1\" class=\"arrowMarkerPath\" style=\"stroke-width: 2.5;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215254784_flowchart-v2-crossStart-margin\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 15 15\" refX=\"-3.5\" refY=\"7.5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"12\" markerHeight=\"12\" orient=\"auto\">\u003Cpath d=\"M 1,1 L 14,14 M 1,14 L 14,1\" class=\"arrowMarkerPath\" style=\"stroke-width: 2.5; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cg class=\"root\">\u003Cg class=\"clusters\">\u003C\u002Fg>\u003Cg class=\"edgePaths\">\u003Cpath d=\"M176.391,35L180.557,35C184.724,35,193.057,35,200.724,35C208.391,35,215.391,35,218.891,35L222.391,35\" id=\"diagram-1775215254784-L_A_B_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_A_B_0\" data-points=\"W3sieCI6MTc2LjM5MDYyNSwieSI6MzV9LHsieCI6MjAxLjM5MDYyNSwieSI6MzV9LHsieCI6MjI2LjM5MDYyNSwieSI6MzV9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215254784_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M427.125,35L431.292,35C435.458,35,443.792,35,451.458,35C459.125,35,466.125,35,469.625,35L473.125,35\" id=\"diagram-1775215254784-L_B_C_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_B_C_0\" data-points=\"W3sieCI6NDI3LjEyNSwieSI6MzV9LHsieCI6NDUyLjEyNSwieSI6MzV9LHsieCI6NDc3LjEyNSwieSI6MzV9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215254784_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M640.016,35L644.182,35C648.349,35,656.682,35,664.349,35C672.016,35,679.016,35,682.516,35L686.016,35\" id=\"diagram-1775215254784-L_C_D_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_C_D_0\" data-points=\"W3sieCI6NjQwLjAxNTYyNSwieSI6MzV9LHsieCI6NjY1LjAxNTYyNSwieSI6MzV9LHsieCI6NjkwLjAxNTYyNSwieSI6MzV9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215254784_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M863.078,35L867.245,35C871.411,35,879.745,35,887.411,35C895.078,35,902.078,35,905.578,35L909.078,35\" id=\"diagram-1775215254784-L_D_E_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_D_E_0\" data-points=\"W3sieCI6ODYzLjA3ODEyNSwieSI6MzV9LHsieCI6ODg4LjA3ODEyNSwieSI6MzV9LHsieCI6OTEzLjA3ODEyNSwieSI6MzV9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215254784_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M1090.016,35L1094.182,35C1098.349,35,1106.682,35,1114.349,35C1122.016,35,1129.016,35,1132.516,35L1136.016,35\" id=\"diagram-1775215254784-L_E_F_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_E_F_0\" data-points=\"W3sieCI6MTA5MC4wMTU2MjUsInkiOjM1fSx7IngiOjExMTUuMDE1NjI1LCJ5IjozNX0seyJ4IjoxMTQwLjAxNTYyNSwieSI6MzV9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215254784_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003C\u002Fg>\u003Cg class=\"edgeLabels\">\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_A_B_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_B_C_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_C_D_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_D_E_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_E_F_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"nodes\">\u003Cg class=\"node default  \" id=\"diagram-1775215254784-flowchart-A-0\" data-look=\"classic\" transform=\"translate(92.1953125, 35)\">\u003Crect class=\"basic label-container\" style=\"\" x=\"-84.1953125\" y=\"-27\" width=\"168.390625\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-54.1953125, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"108.390625\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Client Request\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215254784-flowchart-B-1\" data-look=\"classic\" transform=\"translate(326.7578125, 35)\">\u003Crect class=\"basic label-container\" style=\"\" x=\"-100.3671875\" y=\"-27\" width=\"200.734375\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-70.3671875, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"140.734375\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Inference Gateway\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215254784-flowchart-C-3\" data-look=\"classic\" transform=\"translate(558.5703125, 35)\">\u003Crect class=\"basic label-container\" style=\"fill:#22c55e !important\" x=\"-81.4453125\" y=\"-27\" width=\"162.890625\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"color:#fff !important\" transform=\"translate(-51.4453125, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"102.890625\" height=\"24\">\u003Cdiv style=\"color: rgb(255, 255, 255) !important; display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\">\u003Cspan style=\"color:#fff !important\" class=\"nodeLabel \">\u003Cp>Prefill Servers\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215254784-flowchart-D-5\" data-look=\"classic\" transform=\"translate(776.546875, 35)\">\u003Crect class=\"basic label-container\" style=\"\" x=\"-86.53125\" y=\"-27\" width=\"173.0625\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-56.53125, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"113.0625\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>KV Cache Store\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215254784-flowchart-E-7\" data-look=\"classic\" transform=\"translate(1001.546875, 35)\">\u003Crect class=\"basic label-container\" style=\"fill:#0ea5e9 !important\" x=\"-88.46875\" y=\"-27\" width=\"176.9375\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"color:#fff !important\" transform=\"translate(-58.46875, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"116.9375\" height=\"24\">\u003Cdiv style=\"color: rgb(255, 255, 255) !important; display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\">\u003Cspan style=\"color:#fff !important\" class=\"nodeLabel \">\u003Cp>Decode Servers\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215254784-flowchart-F-9\" data-look=\"classic\" transform=\"translate(1235.375, 35)\">\u003Crect class=\"basic label-container\" style=\"\" x=\"-95.359375\" y=\"-27\" width=\"190.71875\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-65.359375, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"130.71875\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Response Stream\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003Cdefs>\u003Cfilter id=\"diagram-1775215254784-drop-shadow\" height=\"130%\" width=\"130%\">\u003CfeDropShadow dx=\"4\" dy=\"4\" stdDeviation=\"0\" flood-opacity=\"0.06\" flood-color=\"#000000\">\u003C\u002FfeDropShadow>\u003C\u002Ffilter>\u003C\u002Fdefs>\u003Cdefs>\u003Cfilter id=\"diagram-1775215254784-drop-shadow-small\" height=\"150%\" width=\"150%\">\u003CfeDropShadow dx=\"2\" dy=\"2\" stdDeviation=\"0\" flood-opacity=\"0.06\" flood-color=\"#000000\">\u003C\u002FfeDropShadow>\u003C\u002Ffilter>\u003C\u002Fdefs>\u003Ctext x=\"1333.734375\" y=\"90\" text-anchor=\"end\" fill=\"#6b7280\" stroke=\"#ffffff\" stroke-width=\"3\" paint-order=\"stroke\" font-size=\"11\" font-family=\"system-ui, sans-serif\" opacity=\"0.7\">coreprose.com\u003C\u002Ftext>\u003C\u002Fsvg>\n\u003C\u002Fdiv>\n\u003Cp>📊 \u003Cstrong>Architecture insight:\u003C\u002Fstrong> Disaggregation replaces “one big GPU per pod” with a tunable pipeline per phase, workload, and accelerator.\u003Ca href=\"#source-3\" class=\"citation-link\" title=\"View source [3]\">[3]\u003C\u002Fa>\u003Ca href=\"#source-8\" class=\"citation-link\" title=\"View source [8]\">[8]\u003C\u002Fa>\u003C\u002Fp>\n\u003Ch3>Integration with Inference Gateway\u003C\u002Fh3>\n\u003Cp>llm-d integrates with the Kubernetes Inference Gateway (IGW):\u003Ca href=\"#source-2\" class=\"citation-link\" title=\"View source [2]\">[2]\u003C\u002Fa>\u003Ca href=\"#source-7\" class=\"citation-link\" title=\"View source [7]\">[7]\u003C\u002Fa>\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Applications call a stable gateway API.\u003C\u002Fli>\n\u003Cli>Platform teams optimize routing, placement, and scaling internally.\u003C\u002Fli>\n\u003Cli>Models, policies, and accelerator layouts can change without touching app code.\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Ch3>Topology-aware scheduling\u003C\u002Fh3>\n\u003Cp>The scheduler understands:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>GPU peer-to-peer connectivity\u003C\u002Fli>\n\u003Cli>NUMA layout and local memory bandwidth\u003C\u002Fli>\n\u003Cli>Network fabrics and cross-node bandwidth\u003Ca href=\"#source-3\" class=\"citation-link\" title=\"View source [3]\">[3]\u003C\u002Fa>\u003Ca href=\"#source-10\" class=\"citation-link\" title=\"View source [10]\">[10]\u003C\u002Fa>\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>Using this topology, llm-d:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Routes requests to meet latency SLOs at lowest cost.\u003C\u002Fli>\n\u003Cli>Avoids naive balancing by CPU or generic utilization.\u003Ca href=\"#source-3\" class=\"citation-link\" title=\"View source [3]\">[3]\u003C\u002Fa>\u003Ca href=\"#source-10\" class=\"citation-link\" title=\"View source [10]\">[10]\u003C\u002Fa>\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>Guides and Helm recipes provide “well-lit paths” for deploying llm-d across tens or hundreds of nodes, single- or multi-tenant.\u003Ca href=\"#source-9\" class=\"citation-link\" title=\"View source [9]\">[9]\u003C\u002Fa>\u003C\u002Fp>\n\u003Cp>⚠️ \u003Cstrong>Section takeaway:\u003C\u002Fstrong> llm-d makes inference architecture a native Kubernetes concern, combining vLLM, IGW, and topology-aware scheduling into a reproducible stack.\u003C\u002Fp>\n\u003Chr>\n\u003Ch2>3. Performance and Cost Optimizations for Enterprise LLMs\u003C\u002Fh2>\n\u003Cp>llm-d focuses on levers that determine whether LLMs are economically viable at scale.\u003C\u002Fp>\n\u003Ch3>KV cache aware routing\u003C\u002Fh3>\n\u003Cp>KV cache aware routing sends follow-up or similar prompts to cache-warm nodes, avoiding repeated prefill work.\u003Ca href=\"#source-2\" class=\"citation-link\" title=\"View source [2]\">[2]\u003C\u002Fa>\u003Ca href=\"#source-7\" class=\"citation-link\" title=\"View source [7]\">[7]\u003C\u002Fa>\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Especially valuable for multi-step prompts, agents, and RAG.\u003C\u002Fli>\n\u003Cli>Reduces tail latency and jitter.\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cdiv class=\"mermaid-diagram not-prose my-6\" role=\"img\" aria-label=\"Diagram\">\n\u003Csvg id=\"diagram-1775215255436\" width=\"100%\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F2000\u002Fsvg\" class=\"flowchart\" style=\"max-width: 924.671875px;\" viewBox=\"0 0 924.671875 199\" role=\"graphics-document document\" aria-roledescription=\"flowchart-v2\">\u003Cstyle>#diagram-1775215255436{font-family:system-ui,-apple-system,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#diagram-1775215255436 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#diagram-1775215255436 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#diagram-1775215255436 .error-icon{fill:#552222;}#diagram-1775215255436 .error-text{fill:#552222;stroke:#552222;}#diagram-1775215255436 .edge-thickness-normal{stroke-width:1px;}#diagram-1775215255436 .edge-thickness-thick{stroke-width:3.5px;}#diagram-1775215255436 .edge-pattern-solid{stroke-dasharray:0;}#diagram-1775215255436 .edge-thickness-invisible{stroke-width:0;fill:none;}#diagram-1775215255436 .edge-pattern-dashed{stroke-dasharray:3;}#diagram-1775215255436 .edge-pattern-dotted{stroke-dasharray:2;}#diagram-1775215255436 .marker{fill:#333333;stroke:#333333;}#diagram-1775215255436 .marker.cross{stroke:#333333;}#diagram-1775215255436 svg{font-family:system-ui,-apple-system,sans-serif;font-size:16px;}#diagram-1775215255436 p{margin:0;}#diagram-1775215255436 .label{font-family:system-ui,-apple-system,sans-serif;color:#333;}#diagram-1775215255436 .cluster-label text{fill:#333;}#diagram-1775215255436 .cluster-label span{color:#333;}#diagram-1775215255436 .cluster-label span p{background-color:transparent;}#diagram-1775215255436 .label text,#diagram-1775215255436 span{fill:#333;color:#333;}#diagram-1775215255436 .node rect,#diagram-1775215255436 .node circle,#diagram-1775215255436 .node ellipse,#diagram-1775215255436 .node polygon,#diagram-1775215255436 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#diagram-1775215255436 .rough-node .label text,#diagram-1775215255436 .node .label text,#diagram-1775215255436 .image-shape .label,#diagram-1775215255436 .icon-shape .label{text-anchor:middle;}#diagram-1775215255436 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#diagram-1775215255436 .rough-node .label,#diagram-1775215255436 .node .label,#diagram-1775215255436 .image-shape .label,#diagram-1775215255436 .icon-shape .label{text-align:center;}#diagram-1775215255436 .node.clickable{cursor:pointer;}#diagram-1775215255436 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#diagram-1775215255436 .arrowheadPath{fill:#333333;}#diagram-1775215255436 .edgePath .path{stroke:#333333;stroke-width:1px;}#diagram-1775215255436 .flowchart-link{stroke:#333333;fill:none;}#diagram-1775215255436 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#diagram-1775215255436 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#diagram-1775215255436 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#diagram-1775215255436 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#diagram-1775215255436 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#diagram-1775215255436 .cluster text{fill:#333;}#diagram-1775215255436 .cluster span{color:#333;}#diagram-1775215255436 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:system-ui,-apple-system,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#diagram-1775215255436 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#diagram-1775215255436 rect.text{fill:none;stroke-width:0;}#diagram-1775215255436 .icon-shape,#diagram-1775215255436 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#diagram-1775215255436 .icon-shape p,#diagram-1775215255436 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#diagram-1775215255436 .icon-shape .label rect,#diagram-1775215255436 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#diagram-1775215255436 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#diagram-1775215255436 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#diagram-1775215255436 .node .neo-node{stroke:#9370DB;}#diagram-1775215255436 [data-look=\"neo\"].node rect,#diagram-1775215255436 [data-look=\"neo\"].cluster rect,#diagram-1775215255436 [data-look=\"neo\"].node polygon{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215255436 [data-look=\"neo\"].node path{stroke:#9370DB;stroke-width:1px;}#diagram-1775215255436 [data-look=\"neo\"].node .outer-path{filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215255436 [data-look=\"neo\"].node .neo-line path{stroke:#9370DB;filter:none;}#diagram-1775215255436 [data-look=\"neo\"].node circle{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215255436 [data-look=\"neo\"].node circle .state-start{fill:#000000;}#diagram-1775215255436 [data-look=\"neo\"].icon-shape .icon{fill:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215255436 [data-look=\"neo\"].icon-shape .icon-neo path{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215255436 :root{--mermaid-font-family:system-ui,-apple-system,sans-serif;}\u003C\u002Fstyle>\u003Cg>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-pointEnd\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"5\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"8\" markerHeight=\"8\" orient=\"auto\">\u003Cpath d=\"M 0 0 L 10 5 L 0 10 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-pointStart\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"4.5\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"8\" markerHeight=\"8\" orient=\"auto\">\u003Cpath d=\"M 0 5 L 10 10 L 10 0 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-pointEnd-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 11.5 14\" refX=\"11.5\" refY=\"7\" markerUnits=\"userSpaceOnUse\" markerWidth=\"10.5\" markerHeight=\"14\" orient=\"auto\">\u003Cpath d=\"M 0 0 L 11.5 7 L 0 14 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-pointStart-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 11.5 14\" refX=\"1\" refY=\"7\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11.5\" markerHeight=\"14\" orient=\"auto\">\u003Cpolygon points=\"0,7 11.5,14 11.5,0\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fpolygon>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-circleEnd\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"11\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-circleStart\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"-1\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-circleEnd-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refY=\"5\" refX=\"12.25\" markerUnits=\"userSpaceOnUse\" markerWidth=\"14\" markerHeight=\"14\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-circleStart-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"-2\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"14\" markerHeight=\"14\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-crossEnd\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 11 11\" refX=\"12\" refY=\"5.2\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Cpath d=\"M 1,1 l 9,9 M 10,1 l -9,9\" class=\"arrowMarkerPath\" style=\"stroke-width: 2; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-crossStart\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 11 11\" refX=\"-1\" refY=\"5.2\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Cpath d=\"M 1,1 l 9,9 M 10,1 l -9,9\" class=\"arrowMarkerPath\" style=\"stroke-width: 2; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-crossEnd-margin\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 15 15\" refX=\"17.7\" refY=\"7.5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"12\" markerHeight=\"12\" orient=\"auto\">\u003Cpath d=\"M 1,1 L 14,14 M 1,14 L 14,1\" class=\"arrowMarkerPath\" style=\"stroke-width: 2.5;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255436_flowchart-v2-crossStart-margin\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 15 15\" refX=\"-3.5\" refY=\"7.5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"12\" markerHeight=\"12\" orient=\"auto\">\u003Cpath d=\"M 1,1 L 14,14 M 1,14 L 14,1\" class=\"arrowMarkerPath\" style=\"stroke-width: 2.5; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cg class=\"root\">\u003Cg class=\"clusters\">\u003C\u002Fg>\u003Cg class=\"edgePaths\">\u003Cpath d=\"M161.828,87L165.995,87C170.161,87,178.495,87,186.161,87C193.828,87,200.828,87,204.328,87L207.828,87\" id=\"diagram-1775215255436-L_A_B_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_A_B_0\" data-points=\"W3sieCI6MTYxLjgyODEyNSwieSI6ODd9LHsieCI6MTg2LjgyODEyNSwieSI6ODd9LHsieCI6MjExLjgyODEyNSwieSI6ODd9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215255436_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M322.248,64.873L332.182,59.894C342.116,54.916,361.984,44.958,377.497,39.979C393.01,35,404.169,35,409.749,35L415.328,35\" id=\"diagram-1775215255436-L_B_C_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_B_C_0\" data-points=\"W3sieCI6MzIyLjI0ODM5NDg2MzU2MzQsInkiOjY0Ljg3MzM5NDg2MzU2MzR9LHsieCI6MzgxLjg1MTU2MjUsInkiOjM1fSx7IngiOjQxOS4zMjgxMjUsInkiOjM1fV0=\" data-look=\"classic\" marker-end=\"url(#diagram-1775215255436_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M322.248,109.127L332.182,114.106C342.116,119.084,361.984,129.042,378.901,134.021C395.818,139,409.784,139,416.767,139L423.75,139\" id=\"diagram-1775215255436-L_B_D_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_B_D_0\" data-points=\"W3sieCI6MzIyLjI0ODM5NDg2MzU2MzQsInkiOjEwOS4xMjY2MDUxMzY0MzY2fSx7IngiOjM4MS44NTE1NjI1LCJ5IjoxMzl9LHsieCI6NDI3Ljc1LCJ5IjoxMzl9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215255436_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M637.406,35L641.573,35C645.74,35,654.073,35,661.74,35C669.406,35,676.406,35,679.906,35L683.406,35\" id=\"diagram-1775215255436-L_C_E_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_C_E_0\" data-points=\"W3sieCI6NjM3LjQwNjI1LCJ5IjozNX0seyJ4Ijo2NjIuNDA2MjUsInkiOjM1fSx7IngiOjY4Ny40MDYyNSwieSI6MzV9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215255436_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M628.984,139L634.555,139C640.125,139,651.266,139,664.668,139C678.07,139,693.734,139,701.566,139L709.398,139\" id=\"diagram-1775215255436-L_D_F_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_D_F_0\" data-points=\"W3sieCI6NjI4Ljk4NDM3NSwieSI6MTM5fSx7IngiOjY2Mi40MDYyNSwieSI6MTM5fSx7IngiOjcxMy4zOTg0Mzc1LCJ5IjoxMzl9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215255436_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003C\u002Fg>\u003Cg class=\"edgeLabels\">\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_A_B_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\" transform=\"translate(381.8515625, 35)\">\u003Cg class=\"label\" data-id=\"L_B_C_0\" transform=\"translate(-12.4765625, -12)\">\u003CforeignObject width=\"24.953125\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003Cp>Yes\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\" transform=\"translate(381.8515625, 139)\">\u003Cg class=\"label\" data-id=\"L_B_D_0\" transform=\"translate(-10.921875, -12)\">\u003CforeignObject width=\"21.84375\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003Cp>No\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_C_E_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_D_F_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"nodes\">\u003Cg class=\"node default  \" id=\"diagram-1775215255436-flowchart-A-0\" data-look=\"classic\" transform=\"translate(84.9140625, 87)\">\u003Crect class=\"basic label-container\" style=\"\" x=\"-76.9140625\" y=\"-27\" width=\"153.828125\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-46.9140625, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"93.828125\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>New Prompt\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215255436-flowchart-B-1\" data-look=\"classic\" transform=\"translate(278.1015625, 87)\">\u003Cpolygon points=\"66.2734375,0 132.546875,-66.2734375 66.2734375,-132.546875 0,-66.2734375\" class=\"label-container\" transform=\"translate(-65.7734375, 66.2734375)\">\u003C\u002Fpolygon>\u003Cg class=\"label\" style=\"\" transform=\"translate(-39.2734375, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"78.546875\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Cache Hit?\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215255436-flowchart-C-3\" data-look=\"classic\" transform=\"translate(528.3671875, 35)\">\u003Crect class=\"basic label-container\" style=\"fill:#22c55e !important\" x=\"-109.0390625\" y=\"-27\" width=\"218.078125\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"color:#fff !important\" transform=\"translate(-79.0390625, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"158.078125\" height=\"24\">\u003Cdiv style=\"color: rgb(255, 255, 255) !important; display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\">\u003Cspan style=\"color:#fff !important\" class=\"nodeLabel \">\u003Cp>Route to Warm Node\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215255436-flowchart-D-5\" data-look=\"classic\" transform=\"translate(528.3671875, 139)\">\u003Crect class=\"basic label-container\" style=\"\" x=\"-100.6171875\" y=\"-27\" width=\"201.234375\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-70.6171875, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"141.234375\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Route to Any Node\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215255436-flowchart-E-7\" data-look=\"classic\" transform=\"translate(802.0390625, 35)\">\u003Crect class=\"basic label-container\" style=\"\" x=\"-114.6328125\" y=\"-27\" width=\"229.265625\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-84.6328125, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"169.265625\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Low Latency Response\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215255436-flowchart-F-9\" data-look=\"classic\" transform=\"translate(802.0390625, 139)\">\u003Crect class=\"basic label-container\" style=\"fill:#f59e0b !important\" x=\"-88.640625\" y=\"-27\" width=\"177.28125\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"color:#fff !important\" transform=\"translate(-58.640625, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"117.28125\" height=\"24\">\u003Cdiv style=\"color: rgb(255, 255, 255) !important; display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\">\u003Cspan style=\"color:#fff !important\" class=\"nodeLabel \">\u003Cp>Prefill + Decode\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003Cdefs>\u003Cfilter id=\"diagram-1775215255436-drop-shadow\" height=\"130%\" width=\"130%\">\u003CfeDropShadow dx=\"4\" dy=\"4\" stdDeviation=\"0\" flood-opacity=\"0.06\" flood-color=\"#000000\">\u003C\u002FfeDropShadow>\u003C\u002Ffilter>\u003C\u002Fdefs>\u003Cdefs>\u003Cfilter id=\"diagram-1775215255436-drop-shadow-small\" height=\"150%\" width=\"150%\">\u003CfeDropShadow dx=\"2\" dy=\"2\" stdDeviation=\"0\" flood-opacity=\"0.06\" flood-color=\"#000000\">\u003C\u002FfeDropShadow>\u003C\u002Ffilter>\u003C\u002Fdefs>\u003Ctext x=\"919.671875\" y=\"194\" text-anchor=\"end\" fill=\"#6b7280\" stroke=\"#ffffff\" stroke-width=\"3\" paint-order=\"stroke\" font-size=\"11\" font-family=\"system-ui, sans-serif\" opacity=\"0.7\">coreprose.com\u003C\u002Ftext>\u003C\u002Fsvg>\n\u003C\u002Fdiv>\n\u003Cp>📊 \u003Cstrong>Practical effect:\u003C\u002Fstrong> Users see better latency from cache-warm routing and higher GPU utilization by assigning accelerators to specific pipeline stages instead of cloning full stacks per replica.\u003Ca href=\"#source-2\" class=\"citation-link\" title=\"View source [2]\">[2]\u003C\u002Fa>\u003C\u002Fp>\n\u003Ch3>Disaggregated serving and workload-aware scheduling\u003C\u002Fh3>\n\u003Cp>Separating prefill and decode lets llm-d:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Reduce duplicate model state replication.\u003Ca href=\"#source-2\" class=\"citation-link\" title=\"View source [2]\">[2]\u003C\u002Fa>\u003Ca href=\"#source-8\" class=\"citation-link\" title=\"View source [8]\">[8]\u003C\u002Fa>\u003C\u002Fli>\n\u003Cli>Assign hardware by workload shape (short chat, long-context, large batch).\u003Ca href=\"#source-3\" class=\"citation-link\" title=\"View source [3]\">[3]\u003C\u002Fa>\u003Ca href=\"#source-8\" class=\"citation-link\" title=\"View source [8]\">[8]\u003C\u002Fa>\u003C\u002Fli>\n\u003Cli>Improve:\n\u003Cul>\n\u003Cli>\u003Cstrong>Cost per request\u003C\u002Fstrong> via fewer fully replicated servers.\u003Ca href=\"#source-2\" class=\"citation-link\" title=\"View source [2]\">[2]\u003C\u002Fa>\u003Ca href=\"#source-8\" class=\"citation-link\" title=\"View source [8]\">[8]\u003C\u002Fa>\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Time-to-first-token (TTFT)\u003C\u002Fstrong> with prefill-optimized nodes.\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Time-per-output-token (TPOT)\u003C\u002Fstrong> via stable decode pipelines.\u003Ca href=\"#source-9\" class=\"citation-link\" title=\"View source [9]\">[9]\u003C\u002Fa>\u003C\u002Fli>\n\u003C\u002Ful>\n\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>llm-d is tuned for:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Long-running multi-step prompts\u003C\u002Fli>\n\u003Cli>Retrieval-augmented generation\u003C\u002Fli>\n\u003Cli>Agentic workflows\u003Ca href=\"#source-6\" class=\"citation-link\" title=\"View source [6]\">[6]\u003C\u002Fa>\u003Ca href=\"#source-7\" class=\"citation-link\" title=\"View source [7]\">[7]\u003C\u002Fa>\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>These high-value enterprise patterns stress cache management and scheduling.\u003C\u002Fp>\n\u003Cp>Vendors like Mistral AI note that next-gen models (e.g., Mixture of Experts) require robust KV cache management and disaggregated serving—exactly llm-d’s focus.\u003Ca href=\"#source-1\" class=\"citation-link\" title=\"View source [1]\">[1]\u003C\u002Fa>\u003C\u002Fp>\n\u003Cp>💡 \u003Cstrong>Section takeaway:\u003C\u002Fstrong> llm-d exposes cache locality and phase-aware scheduling as explicit controls, turning raw accelerator capacity into better latency and lower cost for real workloads.\u003C\u002Fp>\n\u003Chr>\n\u003Ch2>4. Multi-Accelerator and Topology-Aware Inference\u003C\u002Fh2>\n\u003Cp>The same mechanisms also let llm-d treat heterogeneous hardware as one programmable pool. Modern clusters often mix:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>High-end GPUs for interactive chat\u003C\u002Fli>\n\u003Cli>Memory-rich accelerators for long-context reasoning\u003C\u002Fli>\n\u003Cli>Custom ASICs\u002FTPUs for batch or offline jobs\u003Ca href=\"#source-3\" class=\"citation-link\" title=\"View source [3]\">[3]\u003C\u002Fa>\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>llm-d offers:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>A unified recipe and scheduler that understands accelerator classes.\u003C\u002Fli>\n\u003Cli>Hardware selection based on workload pattern, not manual guesswork.\u003Ca href=\"#source-3\" class=\"citation-link\" title=\"View source [3]\">[3]\u003C\u002Fa>\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Ch3>Topology and interconnect awareness\u003C\u002Fh3>\n\u003Cp>llm-d surfaces interconnect details—from NUMA layouts to network fabrics and GPU peer-to-peer bandwidth—so communication-heavy workloads land where overhead is minimized.\u003Ca href=\"#source-3\" class=\"citation-link\" title=\"View source [3]\">[3]\u003C\u002Fa>\u003Ca href=\"#source-10\" class=\"citation-link\" title=\"View source [10]\">[10]\u003C\u002Fa>\u003C\u002Fp>\n\u003Cp>Expressed via Kubernetes primitives:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Node labels\u002Ftaints for accelerator type and topology\u003C\u002Fli>\n\u003Cli>Affinity\u002Fanti-affinity and scheduling constraints\u003C\u002Fli>\n\u003Cli>Standard observability for monitoring hot paths\u003Ca href=\"#source-3\" class=\"citation-link\" title=\"View source [3]\">[3]\u003C\u002Fa>\u003Ca href=\"#source-9\" class=\"citation-link\" title=\"View source [9]\">[9]\u003C\u002Fa>\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cdiv class=\"mermaid-diagram not-prose my-6\" role=\"img\" aria-label=\"Diagram\">\n\u003Csvg id=\"diagram-1775215255995\" width=\"100%\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F2000\u002Fsvg\" class=\"flowchart\" style=\"max-width: 746.40625px;\" viewBox=\"0 0 746.40625 403.359375\" role=\"graphics-document document\" aria-roledescription=\"flowchart-v2\">\u003Cstyle>#diagram-1775215255995{font-family:system-ui,-apple-system,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#diagram-1775215255995 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#diagram-1775215255995 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#diagram-1775215255995 .error-icon{fill:#552222;}#diagram-1775215255995 .error-text{fill:#552222;stroke:#552222;}#diagram-1775215255995 .edge-thickness-normal{stroke-width:1px;}#diagram-1775215255995 .edge-thickness-thick{stroke-width:3.5px;}#diagram-1775215255995 .edge-pattern-solid{stroke-dasharray:0;}#diagram-1775215255995 .edge-thickness-invisible{stroke-width:0;fill:none;}#diagram-1775215255995 .edge-pattern-dashed{stroke-dasharray:3;}#diagram-1775215255995 .edge-pattern-dotted{stroke-dasharray:2;}#diagram-1775215255995 .marker{fill:#333333;stroke:#333333;}#diagram-1775215255995 .marker.cross{stroke:#333333;}#diagram-1775215255995 svg{font-family:system-ui,-apple-system,sans-serif;font-size:16px;}#diagram-1775215255995 p{margin:0;}#diagram-1775215255995 .label{font-family:system-ui,-apple-system,sans-serif;color:#333;}#diagram-1775215255995 .cluster-label text{fill:#333;}#diagram-1775215255995 .cluster-label span{color:#333;}#diagram-1775215255995 .cluster-label span p{background-color:transparent;}#diagram-1775215255995 .label text,#diagram-1775215255995 span{fill:#333;color:#333;}#diagram-1775215255995 .node rect,#diagram-1775215255995 .node circle,#diagram-1775215255995 .node ellipse,#diagram-1775215255995 .node polygon,#diagram-1775215255995 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#diagram-1775215255995 .rough-node .label text,#diagram-1775215255995 .node .label text,#diagram-1775215255995 .image-shape .label,#diagram-1775215255995 .icon-shape .label{text-anchor:middle;}#diagram-1775215255995 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#diagram-1775215255995 .rough-node .label,#diagram-1775215255995 .node .label,#diagram-1775215255995 .image-shape .label,#diagram-1775215255995 .icon-shape .label{text-align:center;}#diagram-1775215255995 .node.clickable{cursor:pointer;}#diagram-1775215255995 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#diagram-1775215255995 .arrowheadPath{fill:#333333;}#diagram-1775215255995 .edgePath .path{stroke:#333333;stroke-width:1px;}#diagram-1775215255995 .flowchart-link{stroke:#333333;fill:none;}#diagram-1775215255995 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#diagram-1775215255995 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#diagram-1775215255995 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#diagram-1775215255995 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#diagram-1775215255995 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#diagram-1775215255995 .cluster text{fill:#333;}#diagram-1775215255995 .cluster span{color:#333;}#diagram-1775215255995 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:system-ui,-apple-system,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#diagram-1775215255995 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#diagram-1775215255995 rect.text{fill:none;stroke-width:0;}#diagram-1775215255995 .icon-shape,#diagram-1775215255995 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#diagram-1775215255995 .icon-shape p,#diagram-1775215255995 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#diagram-1775215255995 .icon-shape .label rect,#diagram-1775215255995 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#diagram-1775215255995 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#diagram-1775215255995 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#diagram-1775215255995 .node .neo-node{stroke:#9370DB;}#diagram-1775215255995 [data-look=\"neo\"].node rect,#diagram-1775215255995 [data-look=\"neo\"].cluster rect,#diagram-1775215255995 [data-look=\"neo\"].node polygon{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215255995 [data-look=\"neo\"].node path{stroke:#9370DB;stroke-width:1px;}#diagram-1775215255995 [data-look=\"neo\"].node .outer-path{filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215255995 [data-look=\"neo\"].node .neo-line path{stroke:#9370DB;filter:none;}#diagram-1775215255995 [data-look=\"neo\"].node circle{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215255995 [data-look=\"neo\"].node circle .state-start{fill:#000000;}#diagram-1775215255995 [data-look=\"neo\"].icon-shape .icon{fill:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215255995 [data-look=\"neo\"].icon-shape .icon-neo path{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215255995 :root{--mermaid-font-family:system-ui,-apple-system,sans-serif;}\u003C\u002Fstyle>\u003Cg>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-pointEnd\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"5\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"8\" markerHeight=\"8\" orient=\"auto\">\u003Cpath d=\"M 0 0 L 10 5 L 0 10 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-pointStart\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"4.5\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"8\" markerHeight=\"8\" orient=\"auto\">\u003Cpath d=\"M 0 5 L 10 10 L 10 0 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-pointEnd-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 11.5 14\" refX=\"11.5\" refY=\"7\" markerUnits=\"userSpaceOnUse\" markerWidth=\"10.5\" markerHeight=\"14\" orient=\"auto\">\u003Cpath d=\"M 0 0 L 11.5 7 L 0 14 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-pointStart-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 11.5 14\" refX=\"1\" refY=\"7\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11.5\" markerHeight=\"14\" orient=\"auto\">\u003Cpolygon points=\"0,7 11.5,14 11.5,0\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fpolygon>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-circleEnd\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"11\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-circleStart\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"-1\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-circleEnd-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refY=\"5\" refX=\"12.25\" markerUnits=\"userSpaceOnUse\" markerWidth=\"14\" markerHeight=\"14\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-circleStart-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"-2\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"14\" markerHeight=\"14\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-crossEnd\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 11 11\" refX=\"12\" refY=\"5.2\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Cpath d=\"M 1,1 l 9,9 M 10,1 l -9,9\" class=\"arrowMarkerPath\" style=\"stroke-width: 2; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-crossStart\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 11 11\" refX=\"-1\" refY=\"5.2\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Cpath d=\"M 1,1 l 9,9 M 10,1 l -9,9\" class=\"arrowMarkerPath\" style=\"stroke-width: 2; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-crossEnd-margin\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 15 15\" refX=\"17.7\" refY=\"7.5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"12\" markerHeight=\"12\" orient=\"auto\">\u003Cpath d=\"M 1,1 L 14,14 M 1,14 L 14,1\" class=\"arrowMarkerPath\" style=\"stroke-width: 2.5;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215255995_flowchart-v2-crossStart-margin\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 15 15\" refX=\"-3.5\" refY=\"7.5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"12\" markerHeight=\"12\" orient=\"auto\">\u003Cpath d=\"M 1,1 L 14,14 M 1,14 L 14,1\" class=\"arrowMarkerPath\" style=\"stroke-width: 2.5; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cg class=\"root\">\u003Cg class=\"clusters\">\u003C\u002Fg>\u003Cg class=\"edgePaths\">\u003Cpath d=\"M274.461,52.489L246.214,58.241C217.966,63.993,161.471,75.496,133.224,90.214C104.977,104.932,104.977,122.865,104.977,131.831L104.977,140.797\" id=\"diagram-1775215255995-L_A_B_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_A_B_0\" data-points=\"W3sieCI6Mjc0LjQ2MDkzNzUsInkiOjUyLjQ4OTIzMTUyMjI3MTE3fSx7IngiOjEwNC45NzY1NjI1LCJ5Ijo4N30seyJ4IjoxMDQuOTc2NTYyNSwieSI6MTQ0Ljc5Njg3NX1d\" data-look=\"classic\" marker-end=\"url(#diagram-1775215255995_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M360.352,62L360.352,66.167C360.352,70.333,360.352,78.667,360.352,86.333C360.352,94,360.352,101,360.352,104.5L360.352,108\" id=\"diagram-1775215255995-L_A_C_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_A_C_0\" data-points=\"W3sieCI6MzYwLjM1MTU2MjUsInkiOjYyfSx7IngiOjM2MC4zNTE1NjI1LCJ5Ijo4N30seyJ4IjozNjAuMzUxNTYyNSwieSI6MTEyfV0=\" data-look=\"classic\" marker-end=\"url(#diagram-1775215255995_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M446.242,51.651L476.632,57.543C507.021,63.434,567.799,75.217,598.189,89.411C628.578,103.604,628.578,120.208,628.578,128.51L628.578,136.813\" id=\"diagram-1775215255995-L_A_D_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_A_D_0\" data-points=\"W3sieCI6NDQ2LjI0MjE4NzUsInkiOjUxLjY1MTI2ODQ1ODkxNzA3fSx7IngiOjYyOC41NzgxMjUsInkiOjg3fSx7IngiOjYyOC41NzgxMjUsInkiOjE0MC44MTI1fV0=\" data-look=\"classic\" marker-end=\"url(#diagram-1775215255995_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M104.977,233.563L104.977,243.195C104.977,252.828,104.977,272.094,104.977,285.227C104.977,298.359,104.977,305.359,104.977,308.859L104.977,312.359\" id=\"diagram-1775215255995-L_B_E_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_B_E_0\" data-points=\"W3sieCI6MTA0Ljk3NjU2MjUsInkiOjIzMy41NjI1fSx7IngiOjEwNC45NzY1NjI1LCJ5IjoyOTEuMzU5Mzc1fSx7IngiOjEwNC45NzY1NjI1LCJ5IjozMTYuMzU5Mzc1fV0=\" data-look=\"classic\" marker-end=\"url(#diagram-1775215255995_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M360.352,266.359L360.352,270.526C360.352,274.693,360.352,283.026,360.352,290.693C360.352,298.359,360.352,305.359,360.352,308.859L360.352,312.359\" id=\"diagram-1775215255995-L_C_F_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_C_F_0\" data-points=\"W3sieCI6MzYwLjM1MTU2MjUsInkiOjI2Ni4zNTkzNzV9LHsieCI6MzYwLjM1MTU2MjUsInkiOjI5MS4zNTkzNzV9LHsieCI6MzYwLjM1MTU2MjUsInkiOjMxNi4zNTkzNzV9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215255995_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M628.578,237.547L628.578,246.516C628.578,255.484,628.578,273.422,628.578,285.891C628.578,298.359,628.578,305.359,628.578,308.859L628.578,312.359\" id=\"diagram-1775215255995-L_D_G_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_D_G_0\" data-points=\"W3sieCI6NjI4LjU3ODEyNSwieSI6MjM3LjU0Njg3NX0seyJ4Ijo2MjguNTc4MTI1LCJ5IjoyOTEuMzU5Mzc1fSx7IngiOjYyOC41NzgxMjUsInkiOjMxNi4zNTkzNzV9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215255995_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003C\u002Fg>\u003Cg class=\"edgeLabels\">\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_A_B_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_A_C_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_A_D_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_B_E_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_C_F_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_D_G_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"nodes\">\u003Cg class=\"node default  \" id=\"diagram-1775215255995-flowchart-A-0\" data-look=\"classic\" transform=\"translate(360.3515625, 35)\">\u003Crect class=\"basic label-container\" style=\"\" x=\"-85.890625\" y=\"-27\" width=\"171.78125\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-55.890625, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"111.78125\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Workload Type\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215255995-flowchart-B-1\" data-look=\"classic\" transform=\"translate(104.9765625, 189.1796875)\">\u003Cpolygon points=\"44.3828125,0 88.765625,-44.3828125 44.3828125,-88.765625 0,-44.3828125\" class=\"label-container\" transform=\"translate(-43.8828125, 44.3828125)\">\u003C\u002Fpolygon>\u003Cg class=\"label\" style=\"\" transform=\"translate(-17.3828125, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"34.765625\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Chat\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215255995-flowchart-C-3\" data-look=\"classic\" transform=\"translate(360.3515625, 189.1796875)\">\u003Cpolygon points=\"77.1796875,0 154.359375,-77.1796875 77.1796875,-154.359375 0,-77.1796875\" class=\"label-container\" transform=\"translate(-76.6796875, 77.1796875)\">\u003C\u002Fpolygon>\u003Cg class=\"label\" style=\"\" transform=\"translate(-50.1796875, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"100.359375\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Long Context\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215255995-flowchart-D-5\" data-look=\"classic\" transform=\"translate(628.578125, 189.1796875)\">\u003Cpolygon points=\"48.3671875,0 96.734375,-48.3671875 48.3671875,-96.734375 0,-48.3671875\" class=\"label-container\" transform=\"translate(-47.8671875, 48.3671875)\">\u003C\u002Fpolygon>\u003Cg class=\"label\" style=\"\" transform=\"translate(-21.3671875, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"42.734375\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Batch\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215255995-flowchart-E-7\" data-look=\"classic\" transform=\"translate(104.9765625, 343.359375)\">\u003Crect class=\"basic label-container\" style=\"fill:#22c55e !important\" x=\"-96.9765625\" y=\"-27\" width=\"193.953125\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"color:#fff !important\" transform=\"translate(-66.9765625, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"133.953125\" height=\"24\">\u003Cdiv style=\"color: rgb(255, 255, 255) !important; display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\">\u003Cspan style=\"color:#fff !important\" class=\"nodeLabel \">\u003Cp>Low-latency GPUs\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215255995-flowchart-F-9\" data-look=\"classic\" transform=\"translate(360.3515625, 343.359375)\">\u003Crect class=\"basic label-container\" style=\"fill:#0ea5e9 !important\" x=\"-108.3984375\" y=\"-27\" width=\"216.796875\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"color:#fff !important\" transform=\"translate(-78.3984375, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"156.796875\" height=\"24\">\u003Cdiv style=\"color: rgb(255, 255, 255) !important; display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\">\u003Cspan style=\"color:#fff !important\" class=\"nodeLabel \">\u003Cp>High-memory Nodes\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215255995-flowchart-G-11\" data-look=\"classic\" transform=\"translate(628.578125, 343.359375)\">\u003Crect class=\"basic label-container\" style=\"fill:#f59e0b !important\" x=\"-109.828125\" y=\"-27\" width=\"219.65625\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"color:#fff !important\" transform=\"translate(-79.828125, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"159.65625\" height=\"24\">\u003Cdiv style=\"color: rgb(255, 255, 255) !important; display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\">\u003Cspan style=\"color:#fff !important\" class=\"nodeLabel \">\u003Cp>Cost-optimized ASICs\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003Cdefs>\u003Cfilter id=\"diagram-1775215255995-drop-shadow\" height=\"130%\" width=\"130%\">\u003CfeDropShadow dx=\"4\" dy=\"4\" stdDeviation=\"0\" flood-opacity=\"0.06\" flood-color=\"#000000\">\u003C\u002FfeDropShadow>\u003C\u002Ffilter>\u003C\u002Fdefs>\u003Cdefs>\u003Cfilter id=\"diagram-1775215255995-drop-shadow-small\" height=\"150%\" width=\"150%\">\u003CfeDropShadow dx=\"2\" dy=\"2\" stdDeviation=\"0\" flood-opacity=\"0.06\" flood-color=\"#000000\">\u003C\u002FfeDropShadow>\u003C\u002Ffilter>\u003C\u002Fdefs>\u003Ctext x=\"741.40625\" y=\"398.359375\" text-anchor=\"end\" fill=\"#6b7280\" stroke=\"#ffffff\" stroke-width=\"3\" paint-order=\"stroke\" font-size=\"11\" font-family=\"system-ui, sans-serif\" opacity=\"0.7\">coreprose.com\u003C\u002Ftext>\u003C\u002Fsvg>\n\u003C\u002Fdiv>\n\u003Cp>📊 \u003Cstrong>Planning aid:\u003C\u002Fstrong> Platform teams get a practical scorecard for mixing accelerators by workload—chat, long-context, batch—rather than guessing hardware purchases and placement.\u003Ca href=\"#source-3\" class=\"citation-link\" title=\"View source [3]\">[3]\u003C\u002Fa>\u003C\u002Fp>\n\u003Cp>This multi-accelerator strategy aligns with industry trends: GPU and CPU vendors back llm-d so their hardware participates in a standardized, open inference stack.\u003Ca href=\"#source-1\" class=\"citation-link\" title=\"View source [1]\">[1]\u003C\u002Fa>\u003Ca href=\"#source-10\" class=\"citation-link\" title=\"View source [10]\">[10]\u003C\u002Fa>\u003C\u002Fp>\n\u003Cp>⚡ \u003Cstrong>Section takeaway:\u003C\u002Fstrong> llm-d turns heterogeneous hardware and complex topology into declarative scheduling inputs, enabling portable, vendor-neutral AI fabrics.\u003C\u002Fp>\n\u003Chr>\n\u003Ch2>5. Adoption Path: From First Cluster to Production Platform\u003C\u002Fh2>\n\u003Cp>llm-d pairs advanced capabilities with a realistic adoption path.\u003C\u002Fp>\n\u003Ch3>From quickstart to optimized platforms\u003C\u002Fh3>\n\u003Cp>Official guides and Helm charts provide:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Tested, benchmarked recipes for high-performance deployments.\u003Ca href=\"#source-9\" class=\"citation-link\" title=\"View source [9]\">[9]\u003C\u002Fa>\u003C\u002Fli>\n\u003Cli>Requirements: only basic Kubernetes familiarity.\u003C\u002Fli>\n\u003Cli>Targets:\n\u003Cul>\n\u003Cli>Single-model deployments across tens\u002Fhundreds of nodes\u003C\u002Fli>\n\u003Cli>Multi-tenant model-as-a-service platforms sharing deployments\u003Ca href=\"#source-9\" class=\"citation-link\" title=\"View source [9]\">[9]\u003C\u002Fa>\u003C\u002Fli>\n\u003C\u002Ful>\n\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>The “well-lit path” includes curated configs for:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Intelligent inference scheduling\u003C\u002Fli>\n\u003Cli>Prefill\u002Fdecode disaggregation\u003C\u002Fli>\n\u003Cli>KV cache aware routing\u003Ca href=\"#source-9\" class=\"citation-link\" title=\"View source [9]\">[9]\u003C\u002Fa>\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cdiv class=\"mermaid-diagram not-prose my-6\" role=\"img\" aria-label=\"Diagram\">\n\u003Csvg id=\"diagram-1775215256593\" width=\"100%\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F2000\u002Fsvg\" class=\"flowchart\" style=\"max-width: 1231.578125px;\" viewBox=\"0 0 1231.578125 95\" role=\"graphics-document document\" aria-roledescription=\"flowchart-v2\">\u003Cstyle>#diagram-1775215256593{font-family:system-ui,-apple-system,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#diagram-1775215256593 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#diagram-1775215256593 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#diagram-1775215256593 .error-icon{fill:#552222;}#diagram-1775215256593 .error-text{fill:#552222;stroke:#552222;}#diagram-1775215256593 .edge-thickness-normal{stroke-width:1px;}#diagram-1775215256593 .edge-thickness-thick{stroke-width:3.5px;}#diagram-1775215256593 .edge-pattern-solid{stroke-dasharray:0;}#diagram-1775215256593 .edge-thickness-invisible{stroke-width:0;fill:none;}#diagram-1775215256593 .edge-pattern-dashed{stroke-dasharray:3;}#diagram-1775215256593 .edge-pattern-dotted{stroke-dasharray:2;}#diagram-1775215256593 .marker{fill:#333333;stroke:#333333;}#diagram-1775215256593 .marker.cross{stroke:#333333;}#diagram-1775215256593 svg{font-family:system-ui,-apple-system,sans-serif;font-size:16px;}#diagram-1775215256593 p{margin:0;}#diagram-1775215256593 .label{font-family:system-ui,-apple-system,sans-serif;color:#333;}#diagram-1775215256593 .cluster-label text{fill:#333;}#diagram-1775215256593 .cluster-label span{color:#333;}#diagram-1775215256593 .cluster-label span p{background-color:transparent;}#diagram-1775215256593 .label text,#diagram-1775215256593 span{fill:#333;color:#333;}#diagram-1775215256593 .node rect,#diagram-1775215256593 .node circle,#diagram-1775215256593 .node ellipse,#diagram-1775215256593 .node polygon,#diagram-1775215256593 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#diagram-1775215256593 .rough-node .label text,#diagram-1775215256593 .node .label text,#diagram-1775215256593 .image-shape .label,#diagram-1775215256593 .icon-shape .label{text-anchor:middle;}#diagram-1775215256593 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#diagram-1775215256593 .rough-node .label,#diagram-1775215256593 .node .label,#diagram-1775215256593 .image-shape .label,#diagram-1775215256593 .icon-shape .label{text-align:center;}#diagram-1775215256593 .node.clickable{cursor:pointer;}#diagram-1775215256593 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#diagram-1775215256593 .arrowheadPath{fill:#333333;}#diagram-1775215256593 .edgePath .path{stroke:#333333;stroke-width:1px;}#diagram-1775215256593 .flowchart-link{stroke:#333333;fill:none;}#diagram-1775215256593 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#diagram-1775215256593 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#diagram-1775215256593 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#diagram-1775215256593 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#diagram-1775215256593 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#diagram-1775215256593 .cluster text{fill:#333;}#diagram-1775215256593 .cluster span{color:#333;}#diagram-1775215256593 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:system-ui,-apple-system,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#diagram-1775215256593 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#diagram-1775215256593 rect.text{fill:none;stroke-width:0;}#diagram-1775215256593 .icon-shape,#diagram-1775215256593 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#diagram-1775215256593 .icon-shape p,#diagram-1775215256593 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#diagram-1775215256593 .icon-shape .label rect,#diagram-1775215256593 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#diagram-1775215256593 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#diagram-1775215256593 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#diagram-1775215256593 .node .neo-node{stroke:#9370DB;}#diagram-1775215256593 [data-look=\"neo\"].node rect,#diagram-1775215256593 [data-look=\"neo\"].cluster rect,#diagram-1775215256593 [data-look=\"neo\"].node polygon{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215256593 [data-look=\"neo\"].node path{stroke:#9370DB;stroke-width:1px;}#diagram-1775215256593 [data-look=\"neo\"].node .outer-path{filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215256593 [data-look=\"neo\"].node .neo-line path{stroke:#9370DB;filter:none;}#diagram-1775215256593 [data-look=\"neo\"].node circle{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215256593 [data-look=\"neo\"].node circle .state-start{fill:#000000;}#diagram-1775215256593 [data-look=\"neo\"].icon-shape .icon{fill:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215256593 [data-look=\"neo\"].icon-shape .icon-neo path{stroke:#9370DB;filter:drop-shadow(1px 2px 2px rgba(185, 185, 185, 1));}#diagram-1775215256593 :root{--mermaid-font-family:system-ui,-apple-system,sans-serif;}\u003C\u002Fstyle>\u003Cg>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-pointEnd\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"5\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"8\" markerHeight=\"8\" orient=\"auto\">\u003Cpath d=\"M 0 0 L 10 5 L 0 10 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-pointStart\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"4.5\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"8\" markerHeight=\"8\" orient=\"auto\">\u003Cpath d=\"M 0 5 L 10 10 L 10 0 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-pointEnd-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 11.5 14\" refX=\"11.5\" refY=\"7\" markerUnits=\"userSpaceOnUse\" markerWidth=\"10.5\" markerHeight=\"14\" orient=\"auto\">\u003Cpath d=\"M 0 0 L 11.5 7 L 0 14 z\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-pointStart-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 11.5 14\" refX=\"1\" refY=\"7\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11.5\" markerHeight=\"14\" orient=\"auto\">\u003Cpolygon points=\"0,7 11.5,14 11.5,0\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fpolygon>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-circleEnd\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"11\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-circleStart\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"-1\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 1; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-circleEnd-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refY=\"5\" refX=\"12.25\" markerUnits=\"userSpaceOnUse\" markerWidth=\"14\" markerHeight=\"14\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-circleStart-margin\" class=\"marker flowchart-v2\" viewBox=\"0 0 10 10\" refX=\"-2\" refY=\"5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"14\" markerHeight=\"14\" orient=\"auto\">\u003Ccircle cx=\"5\" cy=\"5\" r=\"5\" class=\"arrowMarkerPath\" style=\"stroke-width: 0; stroke-dasharray: 1, 0;\">\u003C\u002Fcircle>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-crossEnd\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 11 11\" refX=\"12\" refY=\"5.2\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Cpath d=\"M 1,1 l 9,9 M 10,1 l -9,9\" class=\"arrowMarkerPath\" style=\"stroke-width: 2; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-crossStart\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 11 11\" refX=\"-1\" refY=\"5.2\" markerUnits=\"userSpaceOnUse\" markerWidth=\"11\" markerHeight=\"11\" orient=\"auto\">\u003Cpath d=\"M 1,1 l 9,9 M 10,1 l -9,9\" class=\"arrowMarkerPath\" style=\"stroke-width: 2; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-crossEnd-margin\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 15 15\" refX=\"17.7\" refY=\"7.5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"12\" markerHeight=\"12\" orient=\"auto\">\u003Cpath d=\"M 1,1 L 14,14 M 1,14 L 14,1\" class=\"arrowMarkerPath\" style=\"stroke-width: 2.5;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cmarker id=\"diagram-1775215256593_flowchart-v2-crossStart-margin\" class=\"marker cross flowchart-v2\" viewBox=\"0 0 15 15\" refX=\"-3.5\" refY=\"7.5\" markerUnits=\"userSpaceOnUse\" markerWidth=\"12\" markerHeight=\"12\" orient=\"auto\">\u003Cpath d=\"M 1,1 L 14,14 M 1,14 L 14,1\" class=\"arrowMarkerPath\" style=\"stroke-width: 2.5; stroke-dasharray: 1, 0;\">\u003C\u002Fpath>\u003C\u002Fmarker>\u003Cg class=\"root\">\u003Cg class=\"clusters\">\u003C\u002Fg>\u003Cg class=\"edgePaths\">\u003Cpath d=\"M202.906,35L207.073,35C211.24,35,219.573,35,227.24,35C234.906,35,241.906,35,245.406,35L248.906,35\" id=\"diagram-1775215256593-L_A_B_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_A_B_0\" data-points=\"W3sieCI6MjAyLjkwNjI1LCJ5IjozNX0seyJ4IjoyMjcuOTA2MjUsInkiOjM1fSx7IngiOjI1Mi45MDYyNSwieSI6MzV9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215256593_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M477.188,35L481.354,35C485.521,35,493.854,35,501.521,35C509.188,35,516.188,35,519.688,35L523.188,35\" id=\"diagram-1775215256593-L_B_C_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_B_C_0\" data-points=\"W3sieCI6NDc3LjE4NzUsInkiOjM1fSx7IngiOjUwMi4xODc1LCJ5IjozNX0seyJ4Ijo1MjcuMTg3NSwieSI6MzV9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215256593_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M729.766,35L733.932,35C738.099,35,746.432,35,754.099,35C761.766,35,768.766,35,772.266,35L775.766,35\" id=\"diagram-1775215256593-L_C_D_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_C_D_0\" data-points=\"W3sieCI6NzI5Ljc2NTYyNSwieSI6MzV9LHsieCI6NzU0Ljc2NTYyNSwieSI6MzV9LHsieCI6Nzc5Ljc2NTYyNSwieSI6MzV9XQ==\" data-look=\"classic\" marker-end=\"url(#diagram-1775215256593_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003Cpath d=\"M951.188,35L955.354,35C959.521,35,967.854,35,975.521,35C983.188,35,990.188,35,993.688,35L997.188,35\" id=\"diagram-1775215256593-L_D_E_0\" class=\" edge-thickness-normal edge-pattern-solid edge-thickness-normal edge-pattern-solid flowchart-link\" style=\";\" data-edge=\"true\" data-et=\"edge\" data-id=\"L_D_E_0\" data-points=\"W3sieCI6OTUxLjE4NzUsInkiOjM1fSx7IngiOjk3Ni4xODc1LCJ5IjozNX0seyJ4IjoxMDAxLjE4NzUsInkiOjM1fV0=\" data-look=\"classic\" marker-end=\"url(#diagram-1775215256593_flowchart-v2-pointEnd)\">\u003C\u002Fpath>\u003C\u002Fg>\u003Cg class=\"edgeLabels\">\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_A_B_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_B_C_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_C_D_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"edgeLabel\">\u003Cg class=\"label\" data-id=\"L_D_E_0\" transform=\"translate(0, 0)\">\u003CforeignObject width=\"0\" height=\"0\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" class=\"labelBkg\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"edgeLabel \">\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"nodes\">\u003Cg class=\"node default  \" id=\"diagram-1775215256593-flowchart-A-0\" data-look=\"classic\" transform=\"translate(105.453125, 35)\">\u003Crect class=\"basic label-container\" style=\"fill:#e5e7eb !important\" x=\"-97.453125\" y=\"-27\" width=\"194.90625\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-67.453125, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"134.90625\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Quickstart Cluster\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215256593-flowchart-B-1\" data-look=\"classic\" transform=\"translate(365.046875, 35)\">\u003Crect class=\"basic label-container\" style=\"\" x=\"-112.140625\" y=\"-27\" width=\"224.28125\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-82.140625, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"164.28125\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Intelligent Scheduling\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215256593-flowchart-C-3\" data-look=\"classic\" transform=\"translate(628.4765625, 35)\">\u003Crect class=\"basic label-container\" style=\"\" x=\"-101.2890625\" y=\"-27\" width=\"202.578125\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-71.2890625, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"142.578125\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>Prefill\u002FDecode Split\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215256593-flowchart-D-5\" data-look=\"classic\" transform=\"translate(865.4765625, 35)\">\u003Crect class=\"basic label-container\" style=\"\" x=\"-85.7109375\" y=\"-27\" width=\"171.421875\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"\" transform=\"translate(-55.7109375, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"111.421875\" height=\"24\">\u003Cdiv xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\" style=\"display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\">\u003Cspan class=\"nodeLabel \">\u003Cp>KV Cache Tests\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003Cg class=\"node default  \" id=\"diagram-1775215256593-flowchart-E-7\" data-look=\"classic\" transform=\"translate(1112.3828125, 35)\">\u003Crect class=\"basic label-container\" style=\"fill:#22c55e !important\" x=\"-111.1953125\" y=\"-27\" width=\"222.390625\" height=\"54\">\u003C\u002Frect>\u003Cg class=\"label\" style=\"color:#fff !important\" transform=\"translate(-81.1953125, -12)\">\u003Crect>\u003C\u002Frect>\u003CforeignObject width=\"162.390625\" height=\"24\">\u003Cdiv style=\"color: rgb(255, 255, 255) !important; display: table-cell; white-space: nowrap; line-height: 1.5; max-width: 200px; text-align: center;\" xmlns=\"http:\u002F\u002Fwww.w3.org\u002F1999\u002Fxhtml\">\u003Cspan style=\"color:#fff !important\" class=\"nodeLabel \">\u003Cp>Multi-tenant Platform\u003C\u002Fp>\u003C\u002Fspan>\u003C\u002Fdiv>\u003C\u002FforeignObject>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003C\u002Fg>\u003Cdefs>\u003Cfilter id=\"diagram-1775215256593-drop-shadow\" height=\"130%\" width=\"130%\">\u003CfeDropShadow dx=\"4\" dy=\"4\" stdDeviation=\"0\" flood-opacity=\"0.06\" flood-color=\"#000000\">\u003C\u002FfeDropShadow>\u003C\u002Ffilter>\u003C\u002Fdefs>\u003Cdefs>\u003Cfilter id=\"diagram-1775215256593-drop-shadow-small\" height=\"150%\" width=\"150%\">\u003CfeDropShadow dx=\"2\" dy=\"2\" stdDeviation=\"0\" flood-opacity=\"0.06\" flood-color=\"#000000\">\u003C\u002FfeDropShadow>\u003C\u002Ffilter>\u003C\u002Fdefs>\u003Ctext x=\"1226.578125\" y=\"90\" text-anchor=\"end\" fill=\"#6b7280\" stroke=\"#ffffff\" stroke-width=\"3\" paint-order=\"stroke\" font-size=\"11\" font-family=\"system-ui, sans-serif\" opacity=\"0.7\">coreprose.com\u003C\u002Ftext>\u003C\u002Fsvg>\n\u003C\u002Fdiv>\n\u003Cp>Red Hat’s guidance helps teams:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Validate KV cache aware routing.\u003C\u002Fli>\n\u003Cli>Measure latency and cost improvements against their own workloads.\u003Ca href=\"#source-7\" class=\"citation-link\" title=\"View source [7]\">[7]\u003C\u002Fa>\u003Ca href=\"#source-8\" class=\"citation-link\" title=\"View source [8]\">[8]\u003C\u002Fa>\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Ch3>Community-driven evolution\u003C\u002Fh3>\n\u003Cp>Cloud Native FM discussions with Red Hat engineers frame llm-d as:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>A practical toolset that strengthens Kubernetes for enterprise LLM inference, not a silver bullet.\u003Ca href=\"#source-2\" class=\"citation-link\" title=\"View source [2]\">[2]\u003C\u002Fa>\u003C\u002Fli>\n\u003Cli>A CNCF Sandbox project inviting contributions from operators, vendors, and researchers.\u003Ca href=\"#source-1\" class=\"citation-link\" title=\"View source [1]\">[1]\u003C\u002Fa>\u003Ca href=\"#source-7\" class=\"citation-link\" title=\"View source [7]\">[7]\u003C\u002Fa>\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>This ensures llm-d tracks rapid shifts in:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Model architectures\u003C\u002Fli>\n\u003Cli>Accelerator types\u003C\u002Fli>\n\u003Cli>Workload patterns\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>💼 \u003Cstrong>Section takeaway:\u003C\u002Fstrong> With opinionated docs, Helm recipes, and open governance, llm-d offers a low-friction path from first experiment to production-grade, multi-tenant LLM platforms.\u003C\u002Fp>\n\u003Chr>\n\u003Ch2>Conclusion: Turning Kubernetes into an AI Fabric\u003C\u002Fh2>\n\u003Cp>By contributing llm-d to CNCF, Red Hat and partners are defining a Kubernetes-native, vendor-neutral standard for distributed LLM inference across accelerators, topologies, and clouds.\u003Ca href=\"#source-1\" class=\"citation-link\" title=\"View source [1]\">[1]\u003C\u002Fa>\u003Ca href=\"#source-3\" class=\"citation-link\" title=\"View source [3]\">[3]\u003C\u002Fa>\u003Ca href=\"#source-7\" class=\"citation-link\" title=\"View source [7]\">[7]\u003C\u002Fa>\u003C\u002Fp>\n\u003Cp>Platform teams can manage GPUs, KV caches, and cluster fabric as programmable resources within the same ecosystem that standardized containers and microservices.\u003C\u002Fp>\n\u003Cp>⚡ \u003Cstrong>Call to action:\u003C\u002Fstrong>\u003Cbr>\nPlatform teams should:\u003C\u002Fp>\n\u003Cul>\n\u003Cli>Pilot llm-d using official guides and Helm recipes.\u003Ca href=\"#source-9\" class=\"citation-link\" title=\"View source [9]\">[9]\u003C\u002Fa>\u003C\u002Fli>\n\u003Cli>Benchmark KV cache aware routing and disaggregated serving against current stacks.\u003Ca href=\"#source-8\" class=\"citation-link\" title=\"View source [8]\">[8]\u003C\u002Fa>\u003C\u002Fli>\n\u003Cli>Engage with the CNCF llm-d community to influence features and roadmap as generative AI evolves.\u003Ca href=\"#source-2\" class=\"citation-link\" title=\"View source [2]\">[2]\u003C\u002Fa>\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Cp>Early adopters will help shape—and benefit from—the next generation of cloud native AI infrastructure.\u003C\u002Fp>\n","Red Hat’s contribution of llm-d to the CNCF Sandbox makes Kubernetes a first-class platform for LLM inference, not just a “good enough” runtime.[1]  \n\nBy treating accelerators, topology, and KV cache...","trend-radar",[],1287,6,"2026-03-31T19:09:15.509Z",[17,22,26,30,34,38],{"title":18,"url":19,"summary":20,"type":21},"Welcome llm-d to the CNCF: Evolving Kubernetes into SOTA AI infrastructure | CNCF","https:\u002F\u002Fwww.cncf.io\u002Fblog\u002F2026\u002F03\u002F24\u002Fwelcome-llm-d-to-the-cncf-evolving-kubernetes-into-sota-ai-infrastructure\u002F","Welcome llm-d to the CNCF: Evolving Kubernetes into SOTA AI infrastructure\n\nPosted on March 24, 2026 by Carlos Costa (IBM Research), Clayton Coleman (Google), and Rob Shaw (Red Hat)\n\nWe are thrilled t...","kb",{"title":23,"url":24,"summary":25,"type":21},"How to Run LLMs on Kubernetes with llm-d: A Distributed Inference Stack","https:\u002F\u002Fwww.linkedin.com\u002Fposts\u002Fsaim-safder_cloudnativefm-cloudnative-ai-activity-7382369451372855296-z3ji","Saim Safder – Cloud Native FM on LinkedIn\n\nIs Kubernetes enough to run enterprise LLMs? It’s close, but only when paired with purpose-built layers. In this episode of Cloud Native FM, we introduce llm...",{"title":27,"url":28,"summary":29,"type":21},"Llm-d: Multi-Accelerator LLM Inference on Kubernetes - Erwan Gallen, Red Hat","https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=g8_snJA_ESU","Llm-d: Multi-Accelerator LLM Inference on Kubernetes - Erwan Gallen, Red Hat\n\nLarge language model serving has grown beyond one GPU per pod. Kubernetes clusters now mix GPUs, TPUs and custom AI ASICs,...",{"title":31,"url":32,"summary":33,"type":21},"Getting started with llm-d for distributed AI inference","https:\u002F\u002Fdevelopers.redhat.com\u002Farticles\u002F2025\u002F08\u002F19\u002Fgetting-started-llm-d-distributed-ai-inference","Getting started with llm-d for distributed AI inference\n\nllm-d: Kubernetes-native distributed inference stack for large-scale LLM applications\n\nAugust 19, 2025\n\nCedric Clyburn, Philip Hayes\n\nRelated t...",{"title":35,"url":36,"summary":37,"type":21},"Guides | llm-d","https:\u002F\u002Fllm-d.ai\u002Fdocs\u002Fguide","Our guides provide tested and benchmarked recipes and Helm charts to serve large language models (LLMs) at peak performance with best practices common to production deployments. A familiarity with bas...",{"title":39,"url":40,"summary":41,"type":21},"Deploying llm-d in Kubernetes: The Future of Distributed AI Inference at Scale","https:\u002F\u002Fthamizhelango.medium.com\u002Fdeploying-llm-d-in-kubernetes-the-future-of-distributed-ai-inference-at-scale-f3ff3eefeb1b","# Deploying llm-d in Kubernetes: The Future of Distributed AI Inference at Scale\n\nIntroduction\n\nllm-d is a new open source community project designed to enable scalable distributed generative AI infer...",null,{"generationDuration":44,"kbQueriesCount":45,"confidenceScore":46,"sourcesCount":14},103494,10,100,{"metaTitle":6,"metaDescription":10},"en","https:\u002F\u002Fimages.unsplash.com\u002Fphoto-1466992133056-ae8de8e22809?ixid=M3w4OTczNDl8MHwxfHNlYXJjaHwxfHxyZWQlMjBoYXQlMjBsbG0lMjBqb2luc3xlbnwxfDB8fHwxNzc0OTg0MTQ4fDA&ixlib=rb-4.1.0&w=1200&h=630&fit=crop&crop=entropy&auto=format,compress",{"photographerName":51,"photographerUrl":52,"unsplashUrl":53},"Jakob Owens","https:\u002F\u002Funsplash.com\u002F@jakobowens1?utm_source=coreprose&utm_medium=referral","https:\u002F\u002Funsplash.com\u002Fphotos\u002Fred-fitted-cap-yGeVdMgmFyE?utm_source=coreprose&utm_medium=referral",false,{"key":56,"name":57,"nameEn":57},"ai-engineering","AI Engineering & LLM Ops",[59,67,75,82],{"id":60,"title":61,"slug":62,"excerpt":63,"category":64,"featuredImage":65,"publishedAt":66},"6a2107893c5f4660db9f0265","Trump’s New AI Executive Order: What Early Federal Access to Models Would Mean for ML Engineering","trump-s-new-ai-executive-order-what-early-federal-access-to-models-would-mean-for-ml-engineering","Trump’s AI agenda treats “winning the AI race” as a geopolitical and economic necessity, prioritizing national and economic security over precautionary regulation. [1][9][10]  \n\nA likely next step is...","safety","https:\u002F\u002Fimages.unsplash.com\u002Fphoto-1612278920639-cfbae3835fee?ixid=M3w4OTczNDl8MHwxfHNlYXJjaHwxfHx0cnVtcCUyMG5ldyUyMGV4ZWN1dGl2ZSUyMG9yZGVyfGVufDF8MHx8fDE3ODA1NDk3Mjd8MA&ixlib=rb-4.1.0&w=1200&h=630&fit=crop&crop=entropy&auto=format,compress&q=60","2026-06-04T05:08:46.537Z",{"id":68,"title":69,"slug":70,"excerpt":71,"category":72,"featuredImage":73,"publishedAt":74},"6a2029363c5f4660db9ea488","How a Meta AI Support Bot Could Be Hijacked to Steal Instagram Accounts via Prompt Injection","how-a-meta-ai-support-bot-could-be-hijacked-to-steal-instagram-accounts-via-prompt-injection","An AI “support assistant” that can reset passwords, change recovery settings, and call internal Meta APIs is effectively a remote admin console behind a chat UI. When this console is driven by an LLM,...","hallucinations","https:\u002F\u002Fimages.unsplash.com\u002Fphoto-1689439518156-3659596b5c6c?ixid=M3w4OTczNDl8MHwxfHNlYXJjaHwxfHxtZXRhJTIwc3VwcG9ydCUyMGJvdCUyMGNvdWxkfGVufDF8MHx8fDE3ODA1MDk4OTd8MA&ixlib=rb-4.1.0&w=1200&h=630&fit=crop&crop=entropy&auto=format,compress&q=60","2026-06-03T13:25:18.479Z",{"id":76,"title":77,"slug":78,"excerpt":79,"category":72,"featuredImage":80,"publishedAt":81},"6a2026a23c5f4660db9ea392","Inside the Meta AI Support Bot Prompt Injection Hack: How Attackers Hijacked High-Profile Instagram Accounts","inside-the-meta-ai-support-bot-prompt-injection-hack-how-attackers-hijacked-high-profile-instagram-accounts","A fake “Meta Support” chat plus a few crafted messages is now enough to compromise accounts worth millions in brand equity.  \n\nIn late 2025 and early 2026, creators reported losing control of high-fol...","https:\u002F\u002Fimages.unsplash.com\u002Fphoto-1689439518156-3659596b5c6c?ixid=M3w4OTczNDl8MHwxfHNlYXJjaHwxfHxpbnNpZGUlMjBtZXRhJTIwc3VwcG9ydCUyMGJvdHxlbnwxfDB8fHwxNzgwNTA5OTAwfDA&ixlib=rb-4.1.0&w=1200&h=630&fit=crop&crop=entropy&auto=format,compress&q=60","2026-06-03T13:14:46.959Z",{"id":83,"title":84,"slug":85,"excerpt":86,"category":72,"featuredImage":87,"publishedAt":88},"6a1fa7e86af3b6cc2a8c04b6","Inside Sysdig’s First Documented LLM-Agent-Driven Cyber Intrusion: An Engineering Playbook","inside-sysdig-s-first-documented-llm-agent-driven-cyber-intrusion-an-engineering-playbook","LLM agents just crossed a line. Sysdig’s report of what appears to be the first documented LLM‑agent‑driven intrusion shows an AI system not only assisting an attacker, but orchestrating an end‑to‑end...","https:\u002F\u002Fimages.unsplash.com\u002Fphoto-1573511860302-28c524319d2a?ixid=M3w4OTczNDl8MHwxfHNlYXJjaHwxfHxpbnNpZGUlMjBzeXNkaWclMjBmaXJzdCUyMGRvY3VtZW50ZWR8ZW58MXwwfHx8MTc4MDQ3NTYwOXww&ixlib=rb-4.1.0&w=1200&h=630&fit=crop&crop=entropy&auto=format,compress&q=60","2026-06-03T04:09:30.910Z",["Island",90],{"key":91,"params":92,"result":94},"ArticleBody_sZ31yyqQcGFZR1CNkSnSnKzb1kPr5rspwIGncimc",{"props":93},"{\"articleId\":\"69cc1b240e6c02b7816bdd82\",\"linkColor\":\"red\"}",{"head":95},{}]