{"id":22588,"date":"2025-11-06T22:55:05","date_gmt":"2025-11-06T22:55:05","guid":{"rendered":"https:\/\/pokecon.jp\/job\/?p=22588"},"modified":"2025-11-06T22:55:05","modified_gmt":"2025-11-06T22:55:05","slug":"vllm-sleep-mode%e3%82%88%e3%82%8b%e3%83%a2%e3%83%87%e3%83%ab%e3%81%ae%e3%82%bc%e3%83%ad%e3%83%aa%e3%83%ad%e3%83%bc%e3%83%89%e5%88%87%e3%82%8a%e6%9b%bf%e3%81%88%e6%a9%9f%e8%83%bd%e3%81%ae%e6%a4%9c","status":"publish","type":"post","link":"https:\/\/pokecon.jp\/job\/22588\/","title":{"rendered":"vLLM Sleep Mode\u3088\u308b\u30e2\u30c7\u30eb\u306e\u30bc\u30ed\u30ea\u30ed\u30fc\u30c9\u5207\u308a\u66ff\u3048\u6a5f\u80fd\u306e\u691c\u8a3c &#8211; NTT docomo Business Engineers&#8217; Blog"},"content":{"rendered":"\n<\/p>\n<div>\n<p>\u3053\u3093\u306b\u3061\u306f\u3002NTT\u30c9\u30b3\u30e2\u30d3\u30b8\u30cd\u30b9\u306e\u9732\u5d0e\u3067\u3059\u3002\u672c\u30d6\u30ed\u30b0\u3067\u306f<a target=\"_blank\" href=\"https:\/\/blog.vllm.ai\/2025\/10\/26\/sleep-mode.html\">vLLM\u306e\u672c\u5bb6\u30b3\u30df\u30e5\u30cb\u30c6\u30a3\u306e\u30d6\u30ed\u30b0<\/a>\u3067\u7d39\u4ecb\u3055\u308c\u305fvLLM\u306e\u30e2\u30c7\u30eb\u306e\u30bc\u30ed\u30ea\u30ed\u30fc\u30c9\u5207\u308a\u66ff\u3048\u6a5f\u80fd\u306e\u6982\u8981\u306b\u52a0\u3048\u3066\u672c\u6a5f\u80fd\u3092Container\u30d9\u30fc\u30b9\u3067\u691c\u8a3c\u3057\u305f\u7d50\u679c\u306b\u3064\u3044\u3066\u7d39\u4ecb\u3057\u307e\u3059\u3002<\/p>\n<p>\u3053\u3093\u306b\u3061\u306f\u3002NTT\u30c9\u30b3\u30e2\u30d3\u30b8\u30cd\u30b9\u306e\u9732\u5d0e\u3067\u3059\u3002\u672c\u30d6\u30ed\u30b0\u3067\u306f<a target=\"_blank\" href=\"https:\/\/blog.vllm.ai\/2025\/10\/26\/sleep-mode.html\">vLLM\u306e\u672c\u5bb6\u30b3\u30df\u30e5\u30cb\u30c6\u30a3\u306e\u30d6\u30ed\u30b0<\/a>\u3067\u7d39\u4ecb\u3055\u308c\u305fvLLM\u306e\u30e2\u30c7\u30eb\u306e\u30bc\u30ed\u30ea\u30ed\u30fc\u30c9\u5207\u308a\u66ff\u3048\u6a5f\u80fd\u306b\u3064\u3044\u3066\u7d39\u4ecb\u3057\u307e\u3059\u3002<\/p>\n<p><a target=\"_blank\" href=\"https:\/\/docs.vllm.ai\/en\/latest\/\">vLLM<\/a>\u306f\u30ab\u30ea\u30d5\u30a9\u30eb\u30cb\u30a2\u5927\u5b66\u30d0\u30fc\u30af\u30ec\u30fc\u6821\u306eSky Computing Lab\u304c\u958b\u767a\u3057OSS\u3067\u516c\u958b\u3057\u3066\u3044\u308b\u63a8\u8ad6\u9ad8\u901f\u5316\u30bd\u30d5\u30c8\u30a6\u30a7\u30a2\u3067\u3059\u3002vLLM\u3067\u306f\u91cf\u5b50\u5316\u3084\u30d0\u30c3\u30c1\u51e6\u7406\u3092\u59cb\u3081\u3068\u3057\u305f\u3055\u307e\u3056\u307e\u306a\u9ad8\u901f\u5316\u30c6\u30af\u30cb\u30c3\u30af\u3092\u5b9f\u88c5\u3057\u3066\u304a\u308a\u3001Large Language Model(LLM)\u306e\u63a8\u8ad6\u3092\u9ad8\u901f\u5316\u3067\u304d\u307e\u3059\u3002\u307e\u305f\u3001vLLM\u306f\u30e2\u30c7\u30eb\u30ea\u30dd\u30b8\u30c8\u30ea\u3067\u3042\u308b<a target=\"_blank\" href=\"https:\/\/huggingface.co\/\">huggingface<\/a>\u3067\u516c\u958b\u3055\u308c\u3066\u3044\u308b\u3055\u307e\u3056\u307e\u306a\u30e2\u30c7\u30eb\u306b\u5bfe\u5fdc\u3057\u3066\u304a\u308a\u3001Meta\u793e\u306e<a target=\"_blank\" href=\"https:\/\/www.llama.com\/\">Llama<\/a>\u3092\u59cb\u3081\u3068\u3059\u308b\u30aa\u30fc\u30d7\u30f3\u30a6\u30a7\u30a4\u30c8\u30e2\u30c7\u30eb\u3092\u9ad8\u901f\u306b\u5b9f\u884c\u3059\u308b\u57fa\u76e4\u30bd\u30d5\u30c8\u30a6\u30a7\u30a2\u3068\u3057\u3066\u4e16\u754c\u7684\u306b\u5229\u7528\u3055\u308c\u3066\u3044\u307e\u3059\u3002<\/p>\n<p>\u672c\u30d6\u30ed\u30b0\u3067\u306fvLLM\u3092\u7528\u3044\u305fLLM\u63a8\u8ad6\u306e\u63d0\u4f9b\u306b\u304a\u3044\u3066\u3001\u540c\u4e00\u306eGPU\u4e0a\u3067\u8907\u6570\u306e\u30e2\u30c7\u30eb\u3092\u5229\u7528\u3057\u305f\u3044\u5834\u5408\u306e\u30e2\u30c7\u30eb\u5207\u308a\u66ff\u3048\u306e\u30aa\u30fc\u30d0\u30d8\u30c3\u30c9\u3092\u89e3\u6c7a\u3059\u308b\u6a5f\u80fd\u3092\u7d39\u4ecb\u3057\u307e\u3059\u3002<\/p>\n<p>vLLM\u306fLLM\u306e\u63a8\u8ad6\u3092\u9ad8\u901f\u306b\u5b9f\u884c\u3057\u3001\u30e6\u30fc\u30b6\u306b\u7d50\u679c\u3092\u7d20\u65e9\u304f\u63d0\u4f9b\u3067\u304d\u308b\u6709\u7528\u6027\u306e\u9ad8\u3044\u30bd\u30d5\u30c8\u30a6\u30a7\u30a2\u3067\u3059\u3002<br \/>\nvLLM\u306f\u9ad8\u901f\u306a\u63a8\u8ad6\u3092\u5b9f\u73fe\u3059\u308b\u305f\u3081\u306bLLM\u306e\u8a08\u7b97\u7528\u30ad\u30e3\u30c3\u30b7\u30e5(a.k.a. KV Cache) \u3092\u4e8b\u524d\u306b\u3067\u304d\u308b\u3060\u3051\u78ba\u4fdd\u3059\u308b(\u30c7\u30d5\u30a9\u30eb\u30c8\u3067\u306f\u5229\u7528\u53ef\u80fd\u306a\u30e1\u30e2\u30ea\u306e90%)\u3068\u3044\u3046\u4ed5\u7d44\u307f\u3092\u6301\u3063\u3066\u3044\u307e\u3059\u3002<br \/>\n\u3053\u306e\u52d5\u4f5c\u306f\u5358\u4e00\u306e\u30e2\u30c7\u30eb\u3092\u63d0\u4f9b\u3059\u308b\u969b\u306b\u306f\u7279\u306b\u554f\u984c\u3068\u306a\u308b\u3053\u3068\u306f\u3042\u308a\u307e\u305b\u3093\u3002\u3057\u304b\u3057\u3001GPU\u3068\u8a00\u3046\u8cb4\u91cd\u306a\u8a08\u7b97\u30ea\u30bd\u30fc\u30b9\u3092\u3042\u308b1\u3064\u306e\u30e2\u30c7\u30eb\u306e\u305f\u3081\u306b\u5360\u6709\u3057\u3066\u3057\u307e\u3046\u3068\u8a00\u3046\u3053\u3068\u3067\u3082\u3042\u308a\u307e\u3059\u3002<\/p>\n<p>\u3053\u306e\u7279\u6027\u306e\u305f\u3081\u3001\u4ee5\u4e0b\u306e\u3088\u3046\u306b\u8907\u6570\u30e2\u30c7\u30eb\u3092\u4f7f\u3044\u5206\u3051\u308b\u30e6\u30fc\u30b9\u30b1\u30fc\u30b9\u306b\u304a\u3044\u3066\u306f\u3001\u5229\u7528\u3059\u308b\u30e2\u30c7\u30eb\u6bce\u306bGPU\u30ea\u30bd\u30fc\u30b9\u3092\u5272\u308a\u5f53\u3066\u308b\u5fc5\u8981\u304c\u3042\u308a\u3001\u30b3\u30b9\u30c8\u3092\u5897\u5927\u3055\u305b\u308b\u3053\u3068\u306b\u3064\u306a\u304c\u308a\u307e\u3059\u3002<\/p>\n<ul>\n<li>\u8cea\u554f\u304b\u3089\u30c6\u30ad\u30b9\u30c8\u30d9\u30fc\u30b9\u3067\u56de\u7b54\u3059\u308bLLM\u3068\u753b\u50cf\u306e\u8aac\u660e\u3084\u5206\u985e\u3092\u8aac\u660e\u3059\u308bVisual Launguage Model (VLM)\u3092\u4f7f\u3044\u5206\u3051\u308b\u30b7\u30b9\u30c6\u30e0<\/li>\n<li>\u8907\u6570\u306e\u7570\u306a\u308bLLM\u306e\u51fa\u529b\u7d50\u679c\u3092\u7d44\u307f\u5408\u308f\u305b\u3066\u6700\u7d42\u7684\u306a\u56de\u7b54\u3092\u5f97\u308b\u30a8\u30fc\u30b8\u30a7\u30f3\u30c8<\/li>\n<\/ul>\n<p>\u3053\u306e\u5bfe\u7b56\u3068\u3057\u3066\u3001vLLM\u304c\u63d0\u4f9b\u3057\u3066\u3044\u308b\u516c\u5f0f<a target=\"_blank\" href=\"https:\/\/hub.docker.com\/r\/vllm\/vllm-openai\">Container\u30a4\u30e1\u30fc\u30b8<\/a>\u3092\u5229\u7528\u3057\u3001\u9700\u8981\u3084\u30bf\u30a4\u30df\u30f3\u30b0\u306b\u5fdc\u3058\u3066\u4f7f\u3046\u30e2\u30c7\u30eb\u3092Container\u306e\u8d77\u52d5\u3001\u7d42\u4e86\u306b\u3088\u3063\u3066\u5207\u308a\u66ff\u3048\u308b\u3068\u8a00\u3063\u305f\u904b\u7528\u306f\u53ef\u80fd\u3067\u3059\u3002<br \/>\n\u3057\u304b\u3057\u3001\u30d5\u30a1\u30a4\u30eb\u30b7\u30b9\u30c6\u30e0\u304b\u3089\u306e\u30e2\u30c7\u30eb\u306e\u30ed\u30fc\u30c9\u3084\u3001\u8d77\u52d5\u6642\u306b\u5b9f\u884c\u3055\u308c\u308b<a target=\"_blank\" href=\"https:\/\/docs.vllm.ai\/en\/latest\/design\/cuda_graphs.html\">Cuda Graph<\/a>\u69cb\u7bc9\u306b\u3088\u308b\u6700\u9069\u5316\u306e\u30aa\u30fc\u30d0\u30d8\u30c3\u30c9\u306e\u305f\u3081\u3001\u4f8b\u3048\u3070<a target=\"_blank\" href=\"https:\/\/huggingface.co\/google\/gemma-3-4b-it\">gemma-3-4b-it<\/a>\u306e\u3088\u3046\u306a\u6bd4\u8f03\u7684\u5c0f\u3055\u3044\u30e2\u30c7\u30eb\u3067\u3042\u3063\u3066\u30821\u5206\u7a0b\u5ea6\u306e\u8d77\u52d5\u5f85\u3061\u304c\u767a\u751f\u3059\u308b\u3068\u8a00\u3046\u8ab2\u984c\u304c\u3042\u308a\u307e\u3059\u3002<\/p>\n<p>\u524d\u8ff0\u306e\u8ab2\u984c\u3092\u89e3\u6c7a\u3059\u308b\u6a5f\u80fd\u306e1\u3064\u304cvLLM\u306eSleep Mode\u3067\u3059\u3002Sleep Mode\u306f\u8d77\u52d5\u4e2d\u306evLLM\u306b\u5bfe\u3057\u3066 <code>\/sleep<\/code> API\u3092\u547c\u3073\u51fa\u3059\u3053\u3068\u3067\u30ed\u30fc\u30c9\u6e08\u307f\u306e\u30e2\u30c7\u30eb\u3092CPU\u306e\u30e1\u30e2\u30ea\u306b\u5f85\u907f\u3059\u308b\u6a5f\u80fd\u3067\u3059\u3002\u5f85\u907f\u3055\u308c\u305f\u30e1\u30e2\u30ea\u306f <code>\/wake_up<\/code> API\u3092\u547c\u3073\u51fa\u3059\u3053\u3068\u3067GPU\u30e1\u30e2\u30ea\u3078\u30ea\u30ed\u30fc\u30c9\u3067\u304d\u307e\u3059\u3002 <code>\/wake_up<\/code> \u3067\u306fCPU\u304b\u3089GPU\u3078\u30e1\u30e2\u30ea\u8ee2\u9001\u3092\u5b9f\u65bd\u3059\u308b\u305f\u3081\u30d5\u30a1\u30a4\u30eb\u30b7\u30b9\u30c6\u30e0\u304b\u3089\u30e2\u30c7\u30eb\u3092\u8aad\u307f\u8fbc\u3080\u3088\u308a\u9ad8\u901f\u306b\u30ed\u30fc\u30c9\u304c\u5b9f\u884c\u3067\u304d\u3001CUDA Graph\u306e\u69cb\u7bc9\u306e\u30aa\u30fc\u30d0\u30d8\u30c3\u30c9\u3092\u56de\u907f\u3067\u304d\u308b\u305f\u3081\u3001vLLM\u306e\u30d7\u30ed\u30bb\u30b9\u3092\u6700\u521d\u304b\u3089\u8d77\u52d5\u3059\u308b\u3088\u308a\u3082\u8fc5\u901f\u306bAPI\u304c\u5229\u7528\u3067\u304d\u308b\u3088\u3046\u306b\u306a\u308a\u307e\u3059\u3002<\/p>\n<p><span itemscope=\"\" itemtype=\"http:\/\/schema.org\/Photograph\"><img decoding=\"async\" src=\"https:\/\/cdn-ak.f.st-hatena.com\/images\/fotolife\/N\/NTTCom\/20251105\/20251105180924.png\" width=\"1072\" height=\"964\" loading=\"lazy\" title=\"\" class=\"hatena-fotolife\" itemprop=\"image\"\/><\/span> <\/p>\n<blockquote>\n<p><strong>NOTE<\/strong><br \/>Sleep Mode\u3092\u4f7f\u3046\u305f\u3081\u306b\u306f\u74b0\u5883\u5909\u6570 <code>VLLM_SERVER_DEV_MODE=1<\/code> \u3092\u8a2d\u5b9a\u3059\u308b\u5fc5\u8981\u304c\u3042\u308a\u307e\u3059\u3002<\/p>\n<\/blockquote>\n<h2 id=\"Sleep-Level\">Sleep Level<\/h2>\n<p><code>\/sleep<\/code> API\u306b\u306f\u5f15\u6570\u3067\u30b9\u30ea\u30fc\u30d7\u6642\u306e\u30ec\u30d9\u30eb\u3092\u8a2d\u5b9a\u3067\u304d\u307e\u3059\u3002(\u4f8b: <code>\/sleep?level=1<\/code>)<\/p>\n<p>\u73fe\u5728\u306f2\u6bb5\u968e\u306e\u30b9\u30ea\u30fc\u30d7\u30ec\u30d9\u30eb\u304c\u3042\u308a\u3001\u305d\u308c\u305e\u308c\u4ee5\u4e0b\u306e\u901a\u308a\u3067\u3059\u3002<\/p>\n<ul>\n<li>Level 1: \u30e2\u30c7\u30eb\u306e\u91cd\u307f\u3092CPU RAM\u3078\u30aa\u30d5\u30ed\u30fc\u30c9\u3057\u3001KV Cache\u3092GPU\u30e1\u30e2\u30ea\u304b\u3089\u5ec3\u68c4\u3059\u308b<\/li>\n<li>Level 2: \u30e2\u30c7\u30eb\u306e\u91cd\u307f\u3068KV Cache\u3092GPU\u30e1\u30e2\u30ea\u304b\u3089\u7834\u68c4\u3057\u3001\u30b9\u30ea\u30fc\u30d7\u6642\u306eCPU RAM\u306e\u4f7f\u7528\u91cf\u3092\u6700\u5c0f\u9650\u306b\u3059\u308b<\/li>\n<\/ul>\n<p><a target=\"_blank\" href=\"https:\/\/docs.vllm.ai\/en\/latest\/features\/sleep_mode.html\">\u516c\u5f0f\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8<\/a>\u3067\u306fLevel 1\u306f\u540c\u4e00\u306e\u30e2\u30c7\u30eb\u3092\u518d\u5ea6\u5229\u7528\u3059\u308b\u969b\u306b\u6709\u52b9\u3067\u3042\u308a\u3001Level 2\u306f\u7570\u306a\u308b\u30e2\u30c7\u30eb\u3092\u5229\u7528\u3059\u308b\u5834\u5408\u306b\u6709\u52b9\u3067\u3042\u308b\u3001\u3068\u89e3\u8aac\u3057\u3066\u3044\u307e\u3059\u3002<\/p>\n<p><a target=\"_blank\" href=\"https:\/\/blog.vllm.ai\/2025\/10\/26\/sleep-mode.html\">vLLM\u306e\u672c\u5bb6\u30b3\u30df\u30e5\u30cb\u30c6\u30a3\u306e\u30d6\u30ed\u30b0<\/a>\u3084<a target=\"_blank\" href=\"https:\/\/docs.vllm.ai\/en\/latest\/features\/sleep_mode.html\">\u516c\u5f0f\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8<\/a>\u3067\u306fPython API\u3001\u53ca\u3073\u3001\u30de\u30eb\u30c1\u30d7\u30ed\u30bb\u30b9\u3067\u306e\u5b9f\u4f8b\u304c\u7d39\u4ecb\u3055\u308c\u3066\u3044\u307e\u3059\u304c\u3001\u5b9f\u969b\u306e\u904b\u7528\u3092\u8003\u616e\u3059\u308b\u3068Container\u30d9\u30fc\u30b9\u3067\u3082\u3053\u306e\u6a5f\u80fd\u304c\u5b9f\u884c\u3067\u304d\u308b\u304b\u3092\u78ba\u8a8d\u3057\u305f\u304b\u3063\u305f\u305f\u3081\u3001\u73fe\u5728\u306eStable\u306e\u6700\u65b0\u7248\u3067\u3042\u308b<a target=\"_blank\" href=\"https:\/\/hub.docker.com\/layers\/vllm\/vllm-openai\/v0.11.0\/images\/sha256-d8d39b59e909d2378ac4feeb191f7e7b6f1342477dc66b7c47cec89e9985ad8a\">v0.11.0<\/a>\u3067\u52d5\u4f5c\u78ba\u8a8d\u3092\u5b9f\u65bd\u3057\u305f\u7d50\u679c\u3092\u7d39\u4ecb\u3057\u307e\u3059\u3002<\/p>\n<h2 id=\"\u74b0\u5883\u69cb\u6210\">\u74b0\u5883\u69cb\u6210<\/h2>\n<ul>\n<li>\n<p>\u30cf\u30fc\u30c9\u30a6\u30a7\u30a2<\/p>\n<ul>\n<li>NVIDIA GeForce RTX 4090 24GB<\/li>\n<\/ul>\n<\/li>\n<li>\n<p>\u30bd\u30d5\u30c8\u30a6\u30a7\u30a2<\/p>\n<ul>\n<li>Ubuntu 22.04.5 LTS<\/li>\n<li>NVIDIA Driver 580.95.05<\/li>\n<li>docker-ce 5:28.5.1-1<\/li>\n<li>vLLM v0.11.0<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h2 id=\"\u5229\u7528\u30e2\u30c7\u30eb\">\u5229\u7528\u30e2\u30c7\u30eb<\/h2>\n<h2 id=\"\u5b9f\u9a131-API\u5229\u7528\u72b6\u6cc1\u306e\u78ba\u8a8d\">\u5b9f\u9a131. API\u5229\u7528\u72b6\u6cc1\u306e\u78ba\u8a8d<\/h2>\n<h3 id=\"vLLM\u306e\u8d77\u52d5\">vLLM\u306e\u8d77\u52d5<\/h3>\n<p>\u307e\u305a\u306fSleep Mode\u304c\u73fe\u5728\u306eStable\u3067\u3042\u308bv0.11.0\u3067\u5229\u7528\u3067\u304d\u308b\u306e\u304b\u3092\u78ba\u8a8d\u3057\u307e\u3059\u3002<br \/>\n\u4ee5\u4e0b\u306e\u30b3\u30de\u30f3\u30c9\u3092\u5b9f\u884c\u3057\u3066\u30e2\u30c7\u30eb\u3092\u8d77\u52d5\u3057\u307e\u3059\u3002\u30ea\u30dd\u30b8\u30c8\u30ea\u3078\u306e\u30a2\u30af\u30bb\u30b9\u6a29\u7b49\u306e\u8a2d\u5b9a\u306b\u3064\u3044\u3066\u306f\u5fc5\u8981\u306b\u5fdc\u3058\u3066HuggingFace\u306e\u30a2\u30ab\u30a6\u30f3\u30c8\u3067\u4f5c\u6210\u3057\u305f\u30c8\u30fc\u30af\u30f3\u3092\u74b0\u5883\u5909\u6570 <code>HF_TOKEN<\/code> \u3078\u8a2d\u5b9a\u3057\u3066\u304f\u3060\u3055\u3044\u3002<\/p>\n<pre class=\"code\" data-lang=\"\" data-unlink=\"\">MODEL=\"google\/gemma-3-4b-it\"\n\nsudo docker run --runtime nvidia --gpus all \\\n    -v ~\/.cache\/huggingface:\/root\/.cache\/huggingface \\\n    -p 8000:8000 \\\n    --ipc=host \\\n    --env \"HUGGING_FACE_HUB_TOKEN=$HF_TOKEN\" \\\n    --env \"VLLM_SERVER_DEV_MODE=1\" \\\n    vllm\/vllm-openai:v0.11.0 \\\n    --model ${MODEL} --enable-sleep-mode<\/pre>\n<p>\u3053\u306e\u30b3\u30de\u30f3\u30c9\u3067vLLM v0.11.0\u3092\u8d77\u52d5\u3067\u304d\u307e\u3059\u304c\u3001\u8d77\u52d5\u6642\u306b\u4ee5\u4e0b\u304c\u30ed\u30b0\u51fa\u529b\u3055\u308c\u307e\u3059\u3002<\/p>\n<pre class=\"code\" data-lang=\"\" data-unlink=\"\">WARNING 10-29 18:58:56 [api_server.py:966] SECURITY WARNING: Development endpoints are enabled! This should NOT be used in production!<\/pre>\n<p>\u3053\u306e\u5f8c\u306e\u624b\u9806\u3067API\u306e\u30b5\u30f3\u30d7\u30eb\u3092\u63d0\u793a\u3057\u307e\u3059\u304c\u3001 <code>\/sleep<\/code> API\u306f\u73fe\u5728\u3001\u901a\u5e38\u306eChat API\u540c\u69d8\u306eAPI\u30a8\u30f3\u30c9\u30dd\u30a4\u30f3\u30c8\u306b\u3088\u3063\u3066\u53d7\u3051\u4ed8\u3051\u3089\u308c\u308b\u305f\u3081\u3001\u7d20\u306e\u307e\u307e\u306eAPI\u3092\u5229\u7528\u3059\u308b\u3068\u4efb\u610f\u306e\u30e6\u30fc\u30b6\u304c\u30e2\u30c7\u30eb\u306e\u505c\u6b62\u304c\u3067\u304d\u308b\u72b6\u614b\u306b\u306a\u308a\u307e\u3059\u3002\u3053\u306e\u305f\u3081\u3001\u30d7\u30ed\u30c0\u30af\u30b7\u30e7\u30f3\u3067\u306e\u5229\u7528\u306f\u73fe\u5728\u63a8\u5968\u3055\u308c\u3066\u3044\u306a\u3044\u3088\u3046\u3067\u3059\u3002<\/p>\n<h3 id=\"Sleep\u306b\u3088\u308b\u30e2\u30c7\u30eb\u306e\u505c\u6b62\">Sleep\u306b\u3088\u308b\u30e2\u30c7\u30eb\u306e\u505c\u6b62<\/h3>\n<p>\u6b21\u306bSleep Mode\u306eAPI\u3092\u547c\u3073\u51fa\u3057\u307e\u3059\u3002Sleep Mode\u306eAPI\u306f\u4ee5\u4e0b\u306e\u901a\u308aREST API\u7d4c\u7531\u3067\u5b9f\u884c\u3057\u307e\u3059\u3002<\/p>\n<pre class=\"code\" data-lang=\"\" data-unlink=\"\">curl -X POST 'localhost:8000\/sleep?level=1'<\/pre>\n<p>\u3053\u308c\u3092\u5b9f\u884c\u3059\u308b\u3068\u4ee5\u4e0b\u306e\u3088\u3046\u306a\u30ed\u30b0\u304c\u51fa\u529b\u3055\u308c\u3001GPU\u30e1\u30e2\u30ea\u304c\u89e3\u653e\u3055\u308c\u307e\u3059\u3002<\/p>\n<pre class=\"code\" data-lang=\"\" data-unlink=\"\">(EngineCore_DP0 pid=165) INFO 10-29 19:36:16 [cumem.py:228] CuMemAllocator: sleep freed 20.25 GiB memory in total, of which 8.63 GiB is backed up in CPU and the rest 11.62 GiB is discarded directly.\n(EngineCore_DP0 pid=165) INFO 10-29 19:36:16 [gpu_worker.py:117] Sleep mode freed 20.23 GiB memory, 1.21 GiB memory is still in use.\n(EngineCore_DP0 pid=165) INFO 10-29 19:36:16 [executor_base.py:189] It took 1.695459 seconds to fall asleep.\n(APIServer pid=1) INFO:     172.17.0.1:36138 - \"POST \/sleep?level=1 HTTP\/1.1\" 200 OK<\/pre>\n<p>\u4eca\u56de\u5229\u7528\u3057\u305f <code>google\/gemma-3-4b-it<\/code> \u3067\u306fvLLM\u8d77\u52d5\u6642\u306b23GB\u7a0b\u5ea6\u5360\u6709\u3057\u3066\u3044\u305f\u30e1\u30e2\u30ea\u304c\u3001 <code>\/sleep<\/code> \u547c\u3073\u51fa\u3057\u5f8c\u306b\u306f1GB\u7a0b\u5ea6\u307e\u3067\u6e1b\u3063\u3066\u3044\u308b\u3053\u3068\u3092\u78ba\u8a8d\u3057\u307e\u3057\u305f\u3002<\/p>\n<blockquote>\n<p><strong>NOTE<\/strong><br \/>Sleep Mode\u4e2d\u306e\u30e2\u30c7\u30eb\u306b\u5bfe\u3057\u3066\u63a8\u8ad6\u30ea\u30af\u30a8\u30b9\u30c8\u3092\u5b9f\u65bd\u3059\u308b\u3068v0.11.0\u306e\u6642\u70b9\u3067\u306f400 Error\u3068\u306a\u308a\u3001vLLM\u30d7\u30ed\u30bb\u30b9\u304c\u30af\u30e9\u30c3\u30b7\u30e5\u3057\u3066\u505c\u6b62\u3057\u307e\u3059\u3002<\/p>\n<\/blockquote>\n<h3 id=\"WakeUp\u306b\u3088\u308b\u30e2\u30c7\u30eb\u306e\u30ea\u30ed\u30fc\u30c9\">WakeUp\u306b\u3088\u308b\u30e2\u30c7\u30eb\u306e\u30ea\u30ed\u30fc\u30c9<\/h3>\n<p>\u30e2\u30c7\u30eb\u306e\u30ea\u30ed\u30fc\u30c9\u3092\u5b9f\u65bd\u3059\u308b\u306b\u306f\u4ee5\u4e0b\u306e\u30b3\u30de\u30f3\u30c9\u3092\u5b9f\u884c\u3057\u307e\u3059\u3002<\/p>\n<pre class=\"code\" data-lang=\"\" data-unlink=\"\">curl -X POST 'localhost:8000\/wake_up'<\/pre>\n<p>\u3053\u308c\u3092\u5b9f\u884c\u3059\u308b\u3068\u4ee5\u4e0b\u306e\u3088\u3046\u306a\u30ed\u30b0\u304c\u51fa\u529b\u3055\u308c\u307e\u3059\u3002<\/p>\n<pre class=\"code\" data-lang=\"\" data-unlink=\"\">(APIServer pid=1) INFO 10-29 19:43:03 [api_server.py:1016] wake up the engine with tags: None\n(EngineCore_DP0 pid=165) INFO 10-29 19:43:03 [executor_base.py:205] It took 0.368304 seconds to wake up tags {'kv_cache', 'weights'}.\n(APIServer pid=1) INFO:     172.17.0.1:58818 - \"POST \/wake_up HTTP\/1.1\" 200 OK<\/pre>\n<p>WakeUp\u304c\u5b8c\u4e86\u3059\u308b\u3068\u3001\u901a\u5e38\u306evLLM\u3068\u3057\u3066OpenAI\u4e92\u63db\u306eAPI\u304c\u5229\u7528\u3067\u304d\u307e\u3059\u3002<\/p>\n<blockquote>\n<p><strong>NOTE<\/strong><br \/>Level 2\u306eSleep Mode\u3067\u306f <code>\/wake_up<\/code> \u306e\u5f8c\u306b <code>\/collective_rpc<\/code> (method\u3068\u3057\u3066 <code>reload_weights<\/code> \u3092\u6307\u5b9a) \u3001 <code>\/reset_prefix_cache<\/code> \u306eAPI\u3092\u547c\u3073\u51fa\u3059\u5fc5\u8981\u304c\u3042\u308a\u307e\u3059\u3002<\/p>\n<\/blockquote>\n<h3 id=\"\u5b9f\u884c\u7d50\u679c\u3068\u30d1\u30d5\u30a9\u30fc\u30de\u30f3\u30b9\u8003\u5bdf\">\u5b9f\u884c\u7d50\u679c\u3068\u30d1\u30d5\u30a9\u30fc\u30de\u30f3\u30b9\u8003\u5bdf<\/h3>\n<p>\u30e2\u30c7\u30eb\u5b9f\u884c\u6642\u306e\u624b\u9806\u3068\u30aa\u30fc\u30d0\u30d8\u30c3\u30c9\u306f\u4ee5\u4e0b\u306e\u901a\u308a\u3067\u3057\u305f\u3002<\/p>\n<div class=\"s_table\"><table>\n<thead>\n<tr>\n<th\/>\n<th>level 1<\/th>\n<th>level 2<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Sleep\u5f8c\u306eGPU\u4f7f\u7528\u30e1\u30e2\u30ea<\/td>\n<td>1236MiB<\/td>\n<td>1241MiB<\/td>\n<\/tr>\n<tr>\n<td>\u5fa9\u5e30\u6642\u306e\u624b\u9806<\/td>\n<td>1 (wake_up)<\/td>\n<td>3 (wake_up, collective_rpc, reset_prefix_cache)<\/td>\n<\/tr>\n<tr>\n<td>\u5fa9\u5e30\u6642\u306e\u30b3\u30de\u30f3\u30c9\u5b9f\u884c\u6642\u9593 (sec)<\/td>\n<td>0.368304<\/td>\n<td>0.737<\/td>\n<\/tr>\n<\/tbody>\n<\/table><\/div>\n<p>Level 1\u3001Level 2\u306e\u3044\u305a\u308c\u306b\u304a\u3044\u3066\u30821\u79d2\u4ee5\u5185\u306b\u30e2\u30c7\u30eb\u306e\u5fa9\u5e30\u304c\u5b8c\u4e86\u3059\u308b\u305f\u3081\u3001vLLM\u306eContainer\u3092\u518d\u8d77\u52d5\u3059\u308b\u3088\u308a\u9ad8\u901f\u306bAPI\u3092\u4f7f\u3048\u308b\u3088\u3046\u306b\u306a\u308b\u3053\u3068\u304c\u308f\u304b\u308a\u307e\u3059\u3002GPU\u30e1\u30e2\u30ea\u3068\u3057\u3066\u306fLevel 1\u3068Level 2\u3067\u306f\u524a\u6e1b\u52b9\u679c\u306f\u540c\u7b49\u3067\u3057\u305f\u3002\u5fa9\u5e30\u306e\u624b\u9806\u306e\u8907\u96d1\u6027\u3092\u8003\u616e\u3059\u308b\u3068Level 1\u3092\u6a19\u6e96\u3068\u3057\u3066\u5229\u7528\u53ef\u80fd\u3060\u3068\u8003\u3048\u307e\u3059\u3002\u516c\u5f0f\u30d6\u30ed\u30b0\u3067\u306fLevel 2\u306e\u969b\u306bCPU\u5074\u306eRAM\u3092\u7bc0\u7d04\u3067\u304d\u308b\u3068\u8a18\u8f09\u304c\u3042\u308b\u305f\u3081\u3001GPU\u3060\u3051\u3067\u306a\u304fSleep\u4e2d\u306eCPU\u30e1\u30e2\u30ea\u3082\u524a\u6e1b\u3057\u305f\u3044\u5834\u5408\u306fLevel 2\u3092\u4f7f\u3046\u3068\u826f\u3044\u3067\u3057\u3087\u3046\u3002<\/p>\n<h2 id=\"\u5b9f\u9a132-\u8907\u6570\u30e2\u30c7\u30eb\u3067\u306e\u5207\u308a\u66ff\u3048\">\u5b9f\u9a132. \u8907\u6570\u30e2\u30c7\u30eb\u3067\u306e\u5207\u308a\u66ff\u3048<\/h2>\n<p>\u5b9f\u9a131\u3067vLLM v0.11.0\u306eContainer\u3067Sleep Mode\u3092\u5229\u7528\u53ef\u80fd\u306a\u3053\u3068\u304c\u78ba\u8a8d\u3067\u304d\u305f\u305f\u3081\u3001\u8907\u6570\u306e\u30e2\u30c7\u30eb\u306e\u5207\u308a\u66ff\u3048\u3092\u5b9f\u884c\u3057\u3066\u307f\u307e\u3059\u3002<br \/>\nvLLM\u306e\u516c\u5f0f\u30d6\u30ed\u30b0\u3067\u306f\u540c\u3058\u74b0\u5883\u5185\u306e\u5225\u30d7\u30ed\u30bb\u30b9\u3068\u3057\u3066\u8d77\u52d5\u3059\u308b\u65b9\u6cd5\u3092\u7d39\u4ecb\u3057\u3066\u3044\u307e\u3057\u305f\u304c\u3001\u3053\u3053\u3067\u306f\u3088\u308a\u672c\u756a\u74b0\u5883\u306b\u8fd1\u3065\u3051\u308b\u305f\u30811\u679a\u306eGPU\u306b\u5bfe\u3057\u30662\u3064\u306evLLM Container\u3092\u8d77\u52d5\u3057\u305f\u72b6\u614b\u3067\u30e2\u30c7\u30eb\u306e\u5207\u308a\u66ff\u3048\u304c\u5b9f\u884c\u3067\u304d\u308b\u304b\u3092\u8a66\u3057\u3066\u3044\u307e\u3059\u3002<\/p>\n<p>\u5b9f\u9a13\u306e\u5b9f\u884c\u624b\u9806\u3068\u3057\u3066\u306f\u4ee5\u4e0b\u306e\u901a\u308a\u3067\u3059\u3002<\/p>\n<h3 id=\"\u6e96\u5099\">\u6e96\u5099<\/h3>\n<ol>\n<li>1\u3064\u3081\u306e\u30e2\u30c7\u30eb\u306evLLM Container\u3092\u8d77\u52d5(8000\u30dd\u30fc\u30c8)<\/li>\n<li><code>\/sleep?level=1<\/code> API\u30671\u3064\u3081\u306e\u30e2\u30c7\u30eb\u3092Sleep<\/li>\n<li>2\u3064\u3081\u306e\u30e2\u30c7\u30eb\u306evLLM Container\u3092\u8d77\u52d5(8001\u30dd\u30fc\u30c8)<\/li>\n<li><code>\/slee?level=1<\/code> API\u30672\u3064\u3081\u306e\u30e2\u30c7\u30eb\u3092Sleep<\/li>\n<\/ol>\n<h3 id=\"\u5b9f\u9a13\u5404\u30e2\u30c7\u30eb\u306b\u5bfe\u3057\u3066\">\u5b9f\u9a13(\u5404\u30e2\u30c7\u30eb\u306b\u5bfe\u3057\u3066)<\/h3>\n<ol>\n<li><code>\/wake_up<\/code> API\u3067\u30e2\u30c7\u30eb\u306e\u30ea\u30ed\u30fc\u30c9<\/li>\n<li><code>sample.py<\/code> \u30b9\u30af\u30ea\u30d7\u30c8\u3092\u7528\u3044\u3066healthcheck\u53ca\u3073\u3001\u63a8\u8ad6API\u306e\u547c\u3073\u51fa\u3057<\/li>\n<\/ol>\n<h3 id=\"\u5b9f\u9a13\u7528\u30b9\u30af\u30ea\u30d7\u30c8\">\u5b9f\u9a13\u7528\u30b9\u30af\u30ea\u30d7\u30c8<\/h3>\n<p>\u4eca\u56de\u306f\u4ee5\u4e0b\u306e\u3088\u3046\u306aBash\u30b9\u30af\u30ea\u30d7\u30c8\u3067\u5b9f\u9a13\u3092\u3057\u3066\u3044\u307e\u3059\u3002<\/p>\n<pre class=\"code Bash\" data-lang=\"Bash\" data-unlink=\"\">#!\/bin\/bash\nset -e\n\n# \u5207\u308a\u66ff\u3048\u308b\u30e2\u30c7\u30eb\u306e\u5b9a\u7fa9\nMODELS=(\"meta-llama\/Llama-3.2-1B-Instruct 8000\" \"google\/gemma-3-4b-it 8001\")\n\nRUN_NAME=\"vllm_sleep\"\n\n# \u30b3\u30f3\u30c6\u30ca\u306e\u4f5c\u6210\u3068\u8d77\u52d5\u7528\u95a2\u6570\u3092\u5b9a\u7fa9\ncreate_vllm(){\n    c_name=${1#*\/}\n    echo \"Use name ${c_name}\"\n    docker run -d --name ${RUN_NAME}-${c_name} --rm --runtime nvidia --gpus all \\\n        -v ~\/.cache\/huggingface:\/root\/.cache\/huggingface \\\n        -p $2:8000 \\\n        --ipc=host \\\n        --env \"VLLM_SERVER_DEV_MODE=1\" \\\n        --env \"HUGGING_FACE_HUB_TOKEN=${HF_TOKEN}\" \\\n        vllm\/vllm-openai:v0.11.0 \\\n        --model $1 --enable-sleep-mode\n}\n\n\n# \u5404\u30e2\u30c7\u30eb\u3092\u8d77\u52d5\u3057\u3001API\u306e\u758e\u901a\u304c\u78ba\u8a8d\u3067\u304d\u305f\u3089Sleep\u3059\u308b\nfor model in \"${MODELS[@]}\"; do\n    read model_id port \n\n\n<p>Container\u306e\u8d77\u52d5\u72b6\u614b\u3092\u78ba\u8a8d\u3057\u3001\u63a8\u8ad6API\u3092\u547c\u3073\u51fa\u3059Python\u30b9\u30af\u30ea\u30d7\u30c8( <code>sample.py<\/code> )\u306f\u4ee5\u4e0b\u306e\u901a\u308a\u3067\u3059\u3002<\/p>\n\n<pre class=\"code Python\" data-lang=\"Python\" data-unlink=\"\">#!\/usr\/bin\/env python3\n\nfrom openai import OpenAI\nimport os\nimport requests\nimport time\nimport sys\n\nHOST = \"http:\/\/localhost\"\n\n\ndef wait_to_ready(URL):\n    while True:\n        try:\n            requests.get(f\"{URL}\/health\")\n            print(\"Healthcheck: OK\")\n            break\n        except requests.exceptions.ConnectionError:\n            time.sleep(0.1)\n\n\ndef main():\n    port = sys.argv[1] if len(sys.argv) &gt; 1 else 8000\n    url = f\"{HOST}:{port}\"\n    wait_to_ready(url)\n    # Localhost\u4e0a\u306evLLM\u3067\u3042\u308aOpenAI\u306e\u5b9f\u969b\u306e\u30ad\u30fc\u306f\u4e0d\u8981\u306e\u305f\u3081dummy\u3092\u8a2d\u5b9a \n    os.environ[\"OPENAI_API_KEY\"] = \"dummy\"\n    client = OpenAI(base_url=f\"{url}\/v1\")\n    model_id = client.models.list().data[0].id\n    print(f\"Model {model_id} found.\")\n    response = client.responses.create(\n        model=model_id,\n        input=\"Write a one-sentence bedtime story about a unicorn.\"\n    )\n\n    print(f\"Got response: {response.output_text}\")\n\n\nif __name__ == \"__main__\":\n    main()<\/pre>\n<h3 id=\"\u5b9f\u884c\u7d50\u679c\">\u5b9f\u884c\u7d50\u679c<\/h3>\n<p>\u5b9f\u9a13\u30b9\u30af\u30ea\u30d7\u30c8\u3092\u5b9f\u884c\u3059\u308b\u3068\u3001 <code>--- start benchmark for model switching ---<\/code> \u4ee5\u4e0b\u306e\u51fa\u529b\u304c\u304a\u3088\u305d1\u79d2\u7a0b\u5ea6\u3067\u5b8c\u4e86\u3059\u308b\u3053\u3068\u304c\u78ba\u8a8d\u3067\u304d\u308b\u3068\u601d\u3044\u307e\u3059\u3002\u3053\u306e\u3053\u3068\u304b\u3089vLLM\u306eSleep Mode\u3092\u5229\u7528\u3059\u308b\u3053\u3068\u3067\u8907\u6570\u306eContainer\u3001\u30e2\u30c7\u30eb\u306b\u5bfe\u3057\u30661\u3064\u306eGPU\u30ea\u30bd\u30fc\u30b9\u3067\u52b9\u7387\u7684\u306a\u30e2\u30c7\u30eb\u306e\u5207\u308a\u66ff\u3048\u3092\u5b9f\u65bd\u3067\u304d\u308b\u3053\u3068\u304c\u78ba\u8a8d\u3067\u304d\u307e\u3057\u305f\u3002<\/p>\n<blockquote>\n<p><strong>NOTE<\/strong><br \/>\u4f8b\u793a\u3057\u305f\u30b9\u30af\u30ea\u30d7\u30c8\u3067\u306f <code>meta-llama\/Llama-3.2-1B-Instruct<\/code> \u3001<code>google\/gemma-3-4b-it<\/code> \u306e\u9806\u5e8f\u3067\u8d77\u52d5\u3057\u3001\u554f\u984c\u306a\u304f <code>\/sleep<\/code> , <code>\/wake_up<\/code> \u304c\u52d5\u4f5c\u3059\u308b\u3053\u3068\u3092\u78ba\u8a8d\u3067\u304d\u307e\u3057\u305f\u3002\u3057\u304b\u3057\u3001\u5225\u306e\u5b9f\u9a13\u3068\u3057\u3066 <code>meta-llama\/Llama-3.2-3B-Instruct<\/code> \u3001 <code>google\/gemma-3-4b-it<\/code> \u3067\u8a66\u3057\u305f\u5834\u5408\u306b\u306f\u3001Sleep\u5f8c\u306b <code>meta-llama\/Llama-3.2-3B-Instruct<\/code> \u3092 <code>\/wake_up<\/code> \u3055\u305b\u308b\u30bf\u30a4\u30df\u30f3\u30b0\u3067Out of Memory (OOM) Error\u304c\u8d77\u304d\u3001\u30d7\u30ed\u30bb\u30b9\u304c\u505c\u6b62\u3057\u307e\u3057\u305f\u3002<br \/>\n<code>meta-llama\/Llama-3.2-3B-Instruct<\/code> \u306f <code>meta-llama\/Llama-3.2-1B-Instruct<\/code> \u3088\u308a\u5927\u304d\u306a\u30e2\u30c7\u30eb\u3067\u3001\u5fc5\u8981\u306aGPU\u30e1\u30e2\u30ea\u304c\u5927\u304d\u3044\u305f\u3081\u3001Sleep Mode\u3067\u5207\u308a\u66ff\u3048\u308b\u5834\u5408\u3067\u3042\u3063\u3066\u3082\u6700\u4f4e\u9650\u5fc5\u8981\u306a\u30e1\u30e2\u30ea\u306e\u91cf\u306f\u4e8b\u524d\u306b\u8abf\u67fb\u3057\u3066\u304a\u304f\u5fc5\u8981\u304c\u3042\u308b\u3068\u8003\u3048\u3089\u308c\u307e\u3059\u3002<\/p>\n<\/blockquote>\n<p>\u672c\u30d6\u30ed\u30b0\u3067\u306f\u30e2\u30c7\u30eb\u5207\u308a\u66ff\u3048\u306e\u30aa\u30fc\u30d0\u30fc\u30d8\u30c3\u30c9\u3092\u524a\u6e1b\u3059\u308bvLLM\u306eSleep Mode\u306b\u3064\u3044\u3066\u7d39\u4ecb\u3057\u307e\u3057\u305f\u3002Sleep Mode\u306f\u307e\u3060\u958b\u767a\u6bb5\u968e\u306e\u6a5f\u80fd\u3067\u3059\u304c\u3001\u8907\u6570\u306e\u30e2\u30c7\u30eb\u306e\u51e6\u7406\u306b\u5bfe\u3057\u3066GPU\u3092\u52b9\u7387\u7684\u306b\u5229\u7528\u3067\u304d\u308b\u6a5f\u80fd\u3068\u3044\u3046\u3053\u3068\u304c\u78ba\u8a8d\u3067\u304d\u307e\u3057\u305f\u3002\u4eca\u5f8c\u3001LLM\u306e\u9700\u8981\u62e1\u5927\u306b\u5411\u3051\u3066\u3053\u3046\u3057\u305fGPU\u306e\u51e6\u7406\u52b9\u7387\u3092\u5411\u4e0a\u3055\u305b\u308b\u30bd\u30d5\u30c8\u30a6\u30a7\u30a2\u3001\u4ed5\u7d44\u307f\u3092\u6d3b\u7528\u3059\u308b\u3053\u3068\u306f\u307e\u3059\u307e\u3059\u91cd\u8981\u306b\u306a\u308b\u3068\u8003\u3048\u3089\u308c\u307e\u3059\u3002<\/p>\n<\/div>\n<p><script>(function(d, s, id) {\n  var js, fjs = d.getElementsByTagName(s)[0];\n  if (d.getElementById(id)) return;\n  js = d.createElement(s); js.id = id;\n  js.src = \"\/\/connect.facebook.net\/ja_JP\/sdk.js#xfbml=1&version=v17.0\";\n  fjs.parentNode.insertBefore(js, fjs);\n}(document, 'script', 'facebook-jssdk'));<\/script><br \/>\n<br \/>\n<br \/><a href=\"https:\/\/engineers.ntt.com\/entry\/202511-vllm-sleep-mode\/entry\">\u5143\u306e\u8a18\u4e8b\u3092\u78ba\u8a8d\u3059\u308b <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"\u3053\u3093\u306b\u3061\u306f\u3002NTT\u30c9\u30b3\u30e2\u30d3\u30b8\u30cd\u30b9\u306e\u9732\u5d0e\u3067\u3059\u3002\u672c\u30d6\u30ed\u30b0\u3067\u306fvLLM\u306e\u672c\u5bb6\u30b3\u30df\u30e5\u30cb\u30c6\u30a3\u306e\u30d6\u30ed\u30b0\u3067\u7d39\u4ecb\u3055\u308c\u305fvLLM\u306e\u30e2\u30c7\u30eb\u306e\u30bc\u30ed\u30ea\u30ed\u30fc\u30c9\u5207\u308a\u66ff\u3048\u6a5f\u80fd\u306e\u6982\u8981\u306b\u52a0\u3048\u3066\u672c\u6a5f\u80fd\u3092Container\u30d9\u30fc\u30b9\u3067\u691c\u8a3c\u3057\u305f\u7d50\u679c\u306b\u3064\u3044\u3066\u7d39\u4ecb\u3057\u307e [&hellip;]","protected":false},"author":1,"featured_media":22589,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[4],"tags":[],"class_list":["post-22588","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-company-tec"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.6 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>vLLM Sleep Mode\u3088\u308b\u30e2\u30c7\u30eb\u306e\u30bc\u30ed\u30ea\u30ed\u30fc\u30c9\u5207\u308a\u66ff\u3048\u6a5f\u80fd\u306e\u691c\u8a3c - NTT docomo Business Engineers&#039; Blog - \u30dd\u30b1\u30b3\u30f3<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/engineers.ntt.com\/entry\/202511-vllm-sleep-mode\/entry\" \/>\n<meta property=\"og:locale\" content=\"ja_JP\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"vLLM Sleep Mode\u3088\u308b\u30e2\u30c7\u30eb\u306e\u30bc\u30ed\u30ea\u30ed\u30fc\u30c9\u5207\u308a\u66ff\u3048\u6a5f\u80fd\u306e\u691c\u8a3c - NTT docomo Business Engineers&#039; Blog - \u30dd\u30b1\u30b3\u30f3\" \/>\n<meta property=\"og:description\" content=\"\u3053\u3093\u306b\u3061\u306f\u3002NTT\u30c9\u30b3\u30e2\u30d3\u30b8\u30cd\u30b9\u306e\u9732\u5d0e\u3067\u3059\u3002\u672c\u30d6\u30ed\u30b0\u3067\u306fvLLM\u306e\u672c\u5bb6\u30b3\u30df\u30e5\u30cb\u30c6\u30a3\u306e\u30d6\u30ed\u30b0\u3067\u7d39\u4ecb\u3055\u308c\u305fvLLM\u306e\u30e2\u30c7\u30eb\u306e\u30bc\u30ed\u30ea\u30ed\u30fc\u30c9\u5207\u308a\u66ff\u3048\u6a5f\u80fd\u306e\u6982\u8981\u306b\u52a0\u3048\u3066\u672c\u6a5f\u80fd\u3092Container\u30d9\u30fc\u30b9\u3067\u691c\u8a3c\u3057\u305f\u7d50\u679c\u306b\u3064\u3044\u3066\u7d39\u4ecb\u3057\u307e [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/engineers.ntt.com\/entry\/202511-vllm-sleep-mode\/entry\" \/>\n<meta property=\"og:site_name\" content=\"\u30dd\u30b1\u30b3\u30f3\" \/>\n<meta property=\"article:published_time\" content=\"2025-11-06T22:55:05+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/pokecon.jp\/job\/wp-content\/uploads\/2025\/11\/https3A2F2Fcdn.user_.blog_.st-hatena.com2Fdefault_entry_og_image2F56944342F1752635351448040.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1300\" \/>\n\t<meta property=\"og:image:height\" content=\"731\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"info@pokecon.jp\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u57f7\u7b46\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"info@pokecon.jp\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u63a8\u5b9a\u8aad\u307f\u53d6\u308a\u6642\u9593\" \/>\n\t<meta name=\"twitter:data2\" content=\"3\u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/engineers.ntt.com\\\/entry\\\/202511-vllm-sleep-mode\\\/entry#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/pokecon.jp\\\/job\\\/22588\\\/\"},\"author\":{\"name\":\"info@pokecon.jp\",\"@id\":\"https:\\\/\\\/pokecon.jp\\\/job\\\/#\\\/schema\\\/person\\\/16c9f07b1ba984d165d9aee259bda997\"},\"headline\":\"vLLM Sleep Mode\u3088\u308b\u30e2\u30c7\u30eb\u306e\u30bc\u30ed\u30ea\u30ed\u30fc\u30c9\u5207\u308a\u66ff\u3048\u6a5f\u80fd\u306e\u691c\u8a3c &#8211; NTT docomo Business Engineers&#8217; Blog\",\"datePublished\":\"2025-11-06T22:55:05+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/pokecon.jp\\\/job\\\/22588\\\/\"},\"wordCount\":235,\"image\":{\"@id\":\"https:\\\/\\\/engineers.ntt.com\\\/entry\\\/202511-vllm-sleep-mode\\\/entry#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/pokecon.jp\\\/job\\\/wp-content\\\/uploads\\\/2025\\\/11\\\/https3A2F2Fcdn.user_.blog_.st-hatena.com2Fdefault_entry_og_image2F56944342F1752635351448040.png\",\"articleSection\":[\"\u4f01\u696d\u30c6\u30c3\u30af\"],\"inLanguage\":\"ja\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/pokecon.jp\\\/job\\\/22588\\\/\",\"url\":\"https:\\\/\\\/engineers.ntt.com\\\/entry\\\/202511-vllm-sleep-mode\\\/entry\",\"name\":\"vLLM Sleep Mode\u3088\u308b\u30e2\u30c7\u30eb\u306e\u30bc\u30ed\u30ea\u30ed\u30fc\u30c9\u5207\u308a\u66ff\u3048\u6a5f\u80fd\u306e\u691c\u8a3c - NTT docomo Business Engineers' Blog - \u30dd\u30b1\u30b3\u30f3\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/pokecon.jp\\\/job\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/engineers.ntt.com\\\/entry\\\/202511-vllm-sleep-mode\\\/entry#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/engineers.ntt.com\\\/entry\\\/202511-vllm-sleep-mode\\\/entry#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/pokecon.jp\\\/job\\\/wp-content\\\/uploads\\\/2025\\\/11\\\/https3A2F2Fcdn.user_.blog_.st-hatena.com2Fdefault_entry_og_image2F56944342F1752635351448040.png\",\"datePublished\":\"2025-11-06T22:55:05+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/pokecon.jp\\\/job\\\/#\\\/schema\\\/person\\\/16c9f07b1ba984d165d9aee259bda997\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/engineers.ntt.com\\\/entry\\\/202511-vllm-sleep-mode\\\/entry#breadcrumb\"},\"inLanguage\":\"ja\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/engineers.ntt.com\\\/entry\\\/202511-vllm-sleep-mode\\\/entry\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"ja\",\"@id\":\"https:\\\/\\\/engineers.ntt.com\\\/entry\\\/202511-vllm-sleep-mode\\\/entry#primaryimage\",\"url\":\"https:\\\/\\\/pokecon.jp\\\/job\\\/wp-content\\\/uploads\\\/2025\\\/11\\\/https3A2F2Fcdn.user_.blog_.st-hatena.com2Fdefault_entry_og_image2F56944342F1752635351448040.png\",\"contentUrl\":\"https:\\\/\\\/pokecon.jp\\\/job\\\/wp-content\\\/uploads\\\/2025\\\/11\\\/https3A2F2Fcdn.user_.blog_.st-hatena.com2Fdefault_entry_og_image2F56944342F1752635351448040.png\",\"width\":1300,\"height\":731},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/engineers.ntt.com\\\/entry\\\/202511-vllm-sleep-mode\\\/entry#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\u30db\u30fc\u30e0\",\"item\":\"https:\\\/\\\/pokecon.jp\\\/job\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"vLLM Sleep Mode\u3088\u308b\u30e2\u30c7\u30eb\u306e\u30bc\u30ed\u30ea\u30ed\u30fc\u30c9\u5207\u308a\u66ff\u3048\u6a5f\u80fd\u306e\u691c\u8a3c &#8211; NTT docomo Business Engineers&#8217; Blog\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/pokecon.jp\\\/job\\\/#website\",\"url\":\"https:\\\/\\\/pokecon.jp\\\/job\\\/\",\"name\":\"\u30dd\u30b1\u30b3\u30f3\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/pokecon.jp\\\/job\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ja\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/pokecon.jp\\\/job\\\/#\\\/schema\\\/person\\\/16c9f07b1ba984d165d9aee259bda997\",\"name\":\"info@pokecon.jp\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ja\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/2b0549cd9f7907c092ca5fbb283baf72337f235726e4b46fa39ec0b701ac2fe2?s=96&d=wavatar&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/2b0549cd9f7907c092ca5fbb283baf72337f235726e4b46fa39ec0b701ac2fe2?s=96&d=wavatar&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/2b0549cd9f7907c092ca5fbb283baf72337f235726e4b46fa39ec0b701ac2fe2?s=96&d=wavatar&r=g\",\"caption\":\"info@pokecon.jp\"},\"url\":\"https:\\\/\\\/pokecon.jp\\\/job\\\/author\\\/infopokecon-jp\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"vLLM Sleep Mode\u3088\u308b\u30e2\u30c7\u30eb\u306e\u30bc\u30ed\u30ea\u30ed\u30fc\u30c9\u5207\u308a\u66ff\u3048\u6a5f\u80fd\u306e\u691c\u8a3c - NTT docomo Business Engineers' Blog - \u30dd\u30b1\u30b3\u30f3","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/engineers.ntt.com\/entry\/202511-vllm-sleep-mode\/entry","og_locale":"ja_JP","og_type":"article","og_title":"vLLM Sleep Mode\u3088\u308b\u30e2\u30c7\u30eb\u306e\u30bc\u30ed\u30ea\u30ed\u30fc\u30c9\u5207\u308a\u66ff\u3048\u6a5f\u80fd\u306e\u691c\u8a3c - NTT docomo Business Engineers' Blog - \u30dd\u30b1\u30b3\u30f3","og_description":"\u3053\u3093\u306b\u3061\u306f\u3002NTT\u30c9\u30b3\u30e2\u30d3\u30b8\u30cd\u30b9\u306e\u9732\u5d0e\u3067\u3059\u3002\u672c\u30d6\u30ed\u30b0\u3067\u306fvLLM\u306e\u672c\u5bb6\u30b3\u30df\u30e5\u30cb\u30c6\u30a3\u306e\u30d6\u30ed\u30b0\u3067\u7d39\u4ecb\u3055\u308c\u305fvLLM\u306e\u30e2\u30c7\u30eb\u306e\u30bc\u30ed\u30ea\u30ed\u30fc\u30c9\u5207\u308a\u66ff\u3048\u6a5f\u80fd\u306e\u6982\u8981\u306b\u52a0\u3048\u3066\u672c\u6a5f\u80fd\u3092Container\u30d9\u30fc\u30b9\u3067\u691c\u8a3c\u3057\u305f\u7d50\u679c\u306b\u3064\u3044\u3066\u7d39\u4ecb\u3057\u307e [&hellip;]","og_url":"https:\/\/engineers.ntt.com\/entry\/202511-vllm-sleep-mode\/entry","og_site_name":"\u30dd\u30b1\u30b3\u30f3","article_published_time":"2025-11-06T22:55:05+00:00","og_image":[{"width":1300,"height":731,"url":"https:\/\/pokecon.jp\/job\/wp-content\/uploads\/2025\/11\/https3A2F2Fcdn.user_.blog_.st-hatena.com2Fdefault_entry_og_image2F56944342F1752635351448040.png","type":"image\/png"}],"author":"info@pokecon.jp","twitter_card":"summary_large_image","twitter_misc":{"\u57f7\u7b46\u8005":"info@pokecon.jp","\u63a8\u5b9a\u8aad\u307f\u53d6\u308a\u6642\u9593":"3\u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/engineers.ntt.com\/entry\/202511-vllm-sleep-mode\/entry#article","isPartOf":{"@id":"https:\/\/pokecon.jp\/job\/22588\/"},"author":{"name":"info@pokecon.jp","@id":"https:\/\/pokecon.jp\/job\/#\/schema\/person\/16c9f07b1ba984d165d9aee259bda997"},"headline":"vLLM Sleep Mode\u3088\u308b\u30e2\u30c7\u30eb\u306e\u30bc\u30ed\u30ea\u30ed\u30fc\u30c9\u5207\u308a\u66ff\u3048\u6a5f\u80fd\u306e\u691c\u8a3c &#8211; NTT docomo Business Engineers&#8217; Blog","datePublished":"2025-11-06T22:55:05+00:00","mainEntityOfPage":{"@id":"https:\/\/pokecon.jp\/job\/22588\/"},"wordCount":235,"image":{"@id":"https:\/\/engineers.ntt.com\/entry\/202511-vllm-sleep-mode\/entry#primaryimage"},"thumbnailUrl":"https:\/\/pokecon.jp\/job\/wp-content\/uploads\/2025\/11\/https3A2F2Fcdn.user_.blog_.st-hatena.com2Fdefault_entry_og_image2F56944342F1752635351448040.png","articleSection":["\u4f01\u696d\u30c6\u30c3\u30af"],"inLanguage":"ja"},{"@type":"WebPage","@id":"https:\/\/pokecon.jp\/job\/22588\/","url":"https:\/\/engineers.ntt.com\/entry\/202511-vllm-sleep-mode\/entry","name":"vLLM Sleep Mode\u3088\u308b\u30e2\u30c7\u30eb\u306e\u30bc\u30ed\u30ea\u30ed\u30fc\u30c9\u5207\u308a\u66ff\u3048\u6a5f\u80fd\u306e\u691c\u8a3c - NTT docomo Business Engineers' Blog - \u30dd\u30b1\u30b3\u30f3","isPartOf":{"@id":"https:\/\/pokecon.jp\/job\/#website"},"primaryImageOfPage":{"@id":"https:\/\/engineers.ntt.com\/entry\/202511-vllm-sleep-mode\/entry#primaryimage"},"image":{"@id":"https:\/\/engineers.ntt.com\/entry\/202511-vllm-sleep-mode\/entry#primaryimage"},"thumbnailUrl":"https:\/\/pokecon.jp\/job\/wp-content\/uploads\/2025\/11\/https3A2F2Fcdn.user_.blog_.st-hatena.com2Fdefault_entry_og_image2F56944342F1752635351448040.png","datePublished":"2025-11-06T22:55:05+00:00","author":{"@id":"https:\/\/pokecon.jp\/job\/#\/schema\/person\/16c9f07b1ba984d165d9aee259bda997"},"breadcrumb":{"@id":"https:\/\/engineers.ntt.com\/entry\/202511-vllm-sleep-mode\/entry#breadcrumb"},"inLanguage":"ja","potentialAction":[{"@type":"ReadAction","target":["https:\/\/engineers.ntt.com\/entry\/202511-vllm-sleep-mode\/entry"]}]},{"@type":"ImageObject","inLanguage":"ja","@id":"https:\/\/engineers.ntt.com\/entry\/202511-vllm-sleep-mode\/entry#primaryimage","url":"https:\/\/pokecon.jp\/job\/wp-content\/uploads\/2025\/11\/https3A2F2Fcdn.user_.blog_.st-hatena.com2Fdefault_entry_og_image2F56944342F1752635351448040.png","contentUrl":"https:\/\/pokecon.jp\/job\/wp-content\/uploads\/2025\/11\/https3A2F2Fcdn.user_.blog_.st-hatena.com2Fdefault_entry_og_image2F56944342F1752635351448040.png","width":1300,"height":731},{"@type":"BreadcrumbList","@id":"https:\/\/engineers.ntt.com\/entry\/202511-vllm-sleep-mode\/entry#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\u30db\u30fc\u30e0","item":"https:\/\/pokecon.jp\/job\/"},{"@type":"ListItem","position":2,"name":"vLLM Sleep Mode\u3088\u308b\u30e2\u30c7\u30eb\u306e\u30bc\u30ed\u30ea\u30ed\u30fc\u30c9\u5207\u308a\u66ff\u3048\u6a5f\u80fd\u306e\u691c\u8a3c &#8211; NTT docomo Business Engineers&#8217; Blog"}]},{"@type":"WebSite","@id":"https:\/\/pokecon.jp\/job\/#website","url":"https:\/\/pokecon.jp\/job\/","name":"\u30dd\u30b1\u30b3\u30f3","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/pokecon.jp\/job\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ja"},{"@type":"Person","@id":"https:\/\/pokecon.jp\/job\/#\/schema\/person\/16c9f07b1ba984d165d9aee259bda997","name":"info@pokecon.jp","image":{"@type":"ImageObject","inLanguage":"ja","@id":"https:\/\/secure.gravatar.com\/avatar\/2b0549cd9f7907c092ca5fbb283baf72337f235726e4b46fa39ec0b701ac2fe2?s=96&d=wavatar&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/2b0549cd9f7907c092ca5fbb283baf72337f235726e4b46fa39ec0b701ac2fe2?s=96&d=wavatar&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/2b0549cd9f7907c092ca5fbb283baf72337f235726e4b46fa39ec0b701ac2fe2?s=96&d=wavatar&r=g","caption":"info@pokecon.jp"},"url":"https:\/\/pokecon.jp\/job\/author\/infopokecon-jp\/"}]}},"_links":{"self":[{"href":"https:\/\/pokecon.jp\/job\/wp-json\/wp\/v2\/posts\/22588","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pokecon.jp\/job\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pokecon.jp\/job\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/pokecon.jp\/job\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/pokecon.jp\/job\/wp-json\/wp\/v2\/comments?post=22588"}],"version-history":[{"count":1,"href":"https:\/\/pokecon.jp\/job\/wp-json\/wp\/v2\/posts\/22588\/revisions"}],"predecessor-version":[{"id":22590,"href":"https:\/\/pokecon.jp\/job\/wp-json\/wp\/v2\/posts\/22588\/revisions\/22590"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/pokecon.jp\/job\/wp-json\/wp\/v2\/media\/22589"}],"wp:attachment":[{"href":"https:\/\/pokecon.jp\/job\/wp-json\/wp\/v2\/media?parent=22588"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pokecon.jp\/job\/wp-json\/wp\/v2\/categories?post=22588"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pokecon.jp\/job\/wp-json\/wp\/v2\/tags?post=22588"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}