[refactor] Simplification of Speculative decoding configs #5639

wili-65535 · 2025-07-01T09:15:01Z

Description

An update of PR5296.

We met a weird error in thread-leak of pytest in the old PR (https://nvidia.slack.com/archives/C059LSY62BT/p1751905741648539).
- We worked around the error in this PR with a unusual way (https://github.com/NVIDIA/TensorRT-LLM/pull/5639/files#diff-8dae39367d8c7b1d446c04cb21055db2b911c88bf1661a8f1e9a9e696b3cf4a6R160).
- If we removed class SpecConfig directly, the error would be raised stablely.
- We will fix it in a later PR.
Replace class SpecConfig of tensorrt_llm/_torch/speculatice/interface.py into class DecodingBaseConfig of tensorrt_llm/llmapi/llm_args.py.
- Remove derived classes of class SpecConfig such as DraftTargetConfig, NGramConfig, etc..
- Some small adjust of class DecodingBaseConfig for related usage.
Adopt Fanrong's suggestions for the old PR.
Unify pytorch_weights_path or speculative_model into speculative_model_dir in many places (to align with model_dir).
Unify max_draft_tokens into max_draft_len in many places.
Replace prompt_lookup_num_tokens in NGram pytorch workflow into max_draft_len.
- The remained prompt_lookup_num_tokens is for C++ code, examples/draft_target_model and related tests, we will remove it in a later PR.
Rewrite speculative decoding tests in tests/unittest/_torch/speculative/ to use a unified code style.
A small difference than the old PR:
- We keep the methods update_from_model_config, get_draft_model_prompt, and get_num_extra_kv_tokens as class methods rather than stand-alone tool functions, since I find the errors raise again when I split them out of the class, we may fix later.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

wili-65535 · 2025-07-09T03:33:13Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-07-09T03:38:16Z

PR_Github #11388 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-09T12:16:19Z

PR_Github #11388 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #8421 completed with status: 'FAILURE'

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

wili-65535 · 2025-07-10T03:57:17Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-07-10T04:02:39Z

PR_Github #11498 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-10T08:08:06Z

PR_Github #11498 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8507 completed with status: 'FAILURE'

wili-65535 · 2025-07-10T08:13:54Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-07-10T08:19:49Z

PR_Github #11533 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-10T12:33:59Z

PR_Github #11533 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #8536 completed with status: 'SUCCESS'

wili-65535: Strange error, all tests passed but the pipeline failed at "Kill previous jobs".

wili-65535 · 2025-07-10T12:53:27Z

/bot run

tensorrt-cicd · 2025-07-10T12:59:16Z

PR_Github #11554 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-10T13:23:19Z

PR_Github #11554 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8555 completed with status: 'SUCCESS'

wili-65535 · 2025-07-10T13:43:00Z

Finally here we have a healthy commit with pipeline passed for this PR!!!
Ready for review now!

mikeiovine

Thanks, let's fix the thread leak stuff in a follow up.

tensorrt_llm/_torch/speculative/interface.py

lucaslie

Approving AutoDeploy-related changes

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com> Signed-off-by: Yuxin <yuxinz@nvidia.com>

NVIDIA deleted a comment from tensorrt-cicd Jul 1, 2025

wili-65535 force-pushed the user/wili/spec-configs branch from ad5069d to d8af844 Compare July 1, 2025 10:49

NVIDIA deleted a comment from tensorrt-cicd Jul 1, 2025

wili-65535 force-pushed the user/wili/spec-configs branch from d8af844 to 79a75fb Compare July 1, 2025 15:32

NVIDIA deleted a comment from tensorrt-cicd Jul 1, 2025

wili-65535 force-pushed the user/wili/spec-configs branch 2 times, most recently from 3081c65 to d5115b5 Compare July 2, 2025 09:51

NVIDIA deleted a comment from tensorrt-cicd Jul 2, 2025

wili-65535 force-pushed the user/wili/spec-configs branch from d5115b5 to f90c7ed Compare July 2, 2025 13:01

NVIDIA deleted a comment from tensorrt-cicd Jul 2, 2025

wili-65535 force-pushed the user/wili/spec-configs branch from f90c7ed to 0fb10ba Compare July 3, 2025 08:21

NVIDIA deleted a comment from tensorrt-cicd Jul 3, 2025

wili-65535 force-pushed the user/wili/spec-configs branch from 0fb10ba to b1e01ef Compare July 3, 2025 11:06

NVIDIA deleted a comment from tensorrt-cicd Jul 3, 2025

wili-65535 force-pushed the user/wili/spec-configs branch from 9606876 to 2144c2e Compare July 3, 2025 15:26

NVIDIA deleted a comment from tensorrt-cicd Jul 3, 2025

wili-65535 force-pushed the user/wili/spec-configs branch from 2144c2e to 5478d09 Compare July 4, 2025 00:23

NVIDIA deleted a comment from tensorrt-cicd Jul 4, 2025

wili-65535 force-pushed the user/wili/spec-configs branch 2 times, most recently from f732b13 to 9219d7f Compare July 4, 2025 03:13

wili-65535 force-pushed the user/wili/spec-configs branch from 40293fe to b98113f Compare July 9, 2025 03:33

NVIDIA deleted a comment from tensorrt-cicd Jul 9, 2025

wili-65535 force-pushed the user/wili/spec-configs branch from b98113f to 7f3f013 Compare July 10, 2025 03:55

wili-65535 requested a review from a team as a code owner July 10, 2025 03:55

wili-65535 requested a review from suyoggupta July 10, 2025 03:55

wili-65535 added 3 commits July 10, 2025 11:56

v0.1: cherry-pick and follow Fanrong suggestions

e0ea6e7

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

v0.2: rebase after UserProvided

04d2a23

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

v0.3: safe change v1 (no flaky error, no tool function)

6064bb4

Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>

wili-65535 force-pushed the user/wili/spec-configs branch from 7f3f013 to 6064bb4 Compare July 10, 2025 03:57

wili-65535 requested a review from lfr-0531 July 10, 2025 11:17

mikeiovine approved these changes Jul 10, 2025

View reviewed changes

tensorrt_llm/_torch/speculative/interface.py Show resolved Hide resolved

lucaslie approved these changes Jul 10, 2025

View reviewed changes

mikeiovine merged commit 2e3cf42 into NVIDIA:main Jul 10, 2025
3 checks passed

mikeiovine mentioned this pull request Jul 10, 2025

[fix] Remove SpecConfig and fix thread leak issues #5931

Merged

This was referenced Jul 11, 2025

[refactor] Simplification of Speculative decoding configs - Part 2 #5936

Merged

[refactor] Unify name of NGram speculative decoding #5937

Merged

[refactor] Simplification of Speculative decoding configs #5639

[refactor] Simplification of Speculative decoding configs #5639

Uh oh!

Conversation

wili-65535 commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

wili-65535 commented Jul 9, 2025

Uh oh!

tensorrt-cicd commented Jul 9, 2025

Uh oh!

tensorrt-cicd commented Jul 9, 2025

Uh oh!

wili-65535 commented Jul 10, 2025

Uh oh!

tensorrt-cicd commented Jul 10, 2025

Uh oh!

tensorrt-cicd commented Jul 10, 2025

Uh oh!

wili-65535 commented Jul 10, 2025

Uh oh!

tensorrt-cicd commented Jul 10, 2025

Uh oh!

tensorrt-cicd commented Jul 10, 2025 • edited by wili-65535 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wili-65535 commented Jul 10, 2025

Uh oh!

tensorrt-cicd commented Jul 10, 2025

Uh oh!

tensorrt-cicd commented Jul 10, 2025

Uh oh!

wili-65535 commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikeiovine left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lucaslie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wili-65535 commented Jul 1, 2025 •

edited

Loading

tensorrt-cicd commented Jul 10, 2025 •

edited by wili-65535

Loading

wili-65535 commented Jul 10, 2025 •

edited

Loading