SGLang 部署与使用指南
目录
环境准备与安装
基础环境配置
1 2 3 4 5 6 7 8
| sysctl -w net.ipv6.conf.all.disable_ipv6=1 sysctl -w net.ipv6.conf.default.disable_ipv6=1
export GLOO_SOCKET_IFNAME=eth0
export NCCL_IB_DISABLE=1
|
Docker容器部署
1 2 3 4 5 6 7 8
| docker run -d -t --network=host --gpus all \ --privileged \ --ipc=host \ --cap-add=SYS_PTRACE \ --name sglang \ --ulimit memlock=-1 \ --ulimit stack=67108864 \ -v /mnt:/mnt registry.cn-hangzhou.aliyuncs.com/lky-deploy/llm:sglang-0.4.2.post2 bash
|
服务启动配置
https://github.com/sgl-project/sglang/blob/main/benchmark/deepseek_v3/README.md
单节点启动
1 2 3 4 5 6 7 8 9
| python3 -m sglang.launch_server \ --port 40000 \ --model-path /mnt/DeepSeek-R1 \ --trust-remote-code \ --tp 32 \ --host 0.0.0.0 \
--enable-torch-compile --torch-compile-max-bs
|
双节点分布式启动
节点1 (Rank 0):
1 2 3 4 5 6 7 8 9 10 11
| python3 -m sglang.launch_server \ --port 30000 \ --model-path /mnt/7B \ --mem-fraction-static 0.8 \ --trust-remote-code \ --tp 16 \ --dp 2 \ --enable-dp-attention \ --dist-init-addr 10.0.1.105:20000 \ --nnodes 2 \ --node-rank 0
|
节点2 (Rank 1):
功能测试
1 2 3 4 5 6 7 8 9
| curl http://localhost:30000/generate \ -H "Content-Type: application/json" \ -d '{ "text": "deepseek中有几个e?", "sampling_params": { "max_new_tokens": 3000, "temperature": 0 } }'
|
性能基准测试
1 2 3 4 5 6 7 8 9
| python -m sglang.bench_serving \ --backend sglang --host 127.0.0.1 --port 30000 \ --model /mnt/DeepSeek-V3/DeepSeek-V3/ \ --dataset-name random \ --random-input-len 20000 \ --random-output-len 4096 \ --max-concurrency 8 \ --num-prompts 80 \ --dataset-path /root/ShareGPT_V3_unfiltered_cleaned_split.json &> 20k_8_80.txt &
|
PD分离部署
https://docs.sglang.ai/advanced_features/pd_disaggregation.html#deepseek-multi-node
mooncake需要rdma这种InfiniBand设备支持,nixl不需要
环境准备
1 2 3 4 5 6 7 8 9 10
| wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh ~/miniconda3/bin/conda init bash source ~/.bashrc
conda create -n sglang python=3.10 conda activate sglang pip install sglang[all]
|
服务部署
Prefill服务:
1 2 3 4 5 6 7 8 9 10
|
python -m sglang.launch_server \ --model-path $path \ --disaggregation-mode prefill \ --port 30000 \ --disaggregation-transfer-backend nixl
|
Decode服务:
1 2 3 4 5 6
| python -m sglang.launch_server \ --model-path $path \ --disaggregation-mode decode \ --port 30001 \ --base-gpu-id 1 \ --disaggregation-transfer-backend nixl
|
路由服务:
https://docs.sglang.ai/advanced_features/router.html#mode-3-prefill-decode-disaggregation
1 2 3 4 5 6
| python -m sglang_router.launch_router \ --pd-disaggregation \ --prefill http://${p}:30000 8998 \ --decode http://127.0.0.1:30001 \ --host 0.0.0.0 \ --port 8000
|
分布式部署 (2P1D)
Prefill节点配置
节点0:
1 2 3 4 5 6 7 8 9 10 11
| export CUDA_VISIBLE_DEVICES=0 python -m sglang.launch_server \ --model-path $path \ --disaggregation-transfer-backend nixl \ --disaggregation-mode prefill \ --host 0.0.0.0 \ --port 30000 \ --dist-init-addr ${prefill_master_ip}:5000 \ --nnodes 2 \ --node-rank 0 \ --tp-size 2
|
节点1:
Decode节点配置
1 2 3 4 5 6 7 8 9 10 11
| python -m sglang.launch_server \ --model-path $path \ --disaggregation-transfer-backend nixl \ --disaggregation-mode decode \ --port 30001 \ --dist-init-addr ${decode_master_ip}:5000 \ --nnodes 2 \ --node-rank 0 \ --tp-size 16 \ --dp-size 8 \ --enable-dp-attention
|