Skip to content

[refactor] Simplification of Speculative decoding configs #5639

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 10, 2025

Conversation

wili-65535
Copy link
Collaborator

@wili-65535 wili-65535 commented Jul 1, 2025

Description

An update of PR5296.

  • We met a weird error in thread-leak of pytest in the old PR (https://nvidia.slack.com/archives/C059LSY62BT/p1751905741648539).

  • Replace class SpecConfig of tensorrt_llm/_torch/speculatice/interface.py into class DecodingBaseConfig of tensorrt_llm/llmapi/llm_args.py.

    • Remove derived classes of class SpecConfig such as DraftTargetConfig, NGramConfig, etc..
    • Some small adjust of class DecodingBaseConfig for related usage.
  • Adopt Fanrong's suggestions for the old PR.

  • Unify pytorch_weights_path or speculative_model into speculative_model_dir in many places (to align with model_dir).

  • Unify max_draft_tokens into max_draft_len in many places.

  • Replace prompt_lookup_num_tokens in NGram pytorch workflow into max_draft_len.

    • The remained prompt_lookup_num_tokens is for C++ code, examples/draft_target_model and related tests, we will remove it in a later PR.
  • Rewrite speculative decoding tests in tests/unittest/_torch/speculative/ to use a unified code style.

  • A small difference than the old PR:

    • We keep the methods update_from_model_config, get_draft_model_prompt, and get_num_extra_kv_tokens as class methods rather than stand-alone tool functions, since I find the errors raise again when I split them out of the class, we may fix later.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jul 1, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jul 1, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jul 1, 2025
@wili-65535 wili-65535 force-pushed the user/wili/spec-configs branch from ad5069d to d8af844 Compare July 1, 2025 10:49
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jul 1, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jul 1, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jul 1, 2025
@wili-65535 wili-65535 force-pushed the user/wili/spec-configs branch from d8af844 to 79a75fb Compare July 1, 2025 15:32
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jul 1, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jul 1, 2025
@wili-65535 wili-65535 force-pushed the user/wili/spec-configs branch 2 times, most recently from 3081c65 to d5115b5 Compare July 2, 2025 09:51
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jul 2, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jul 2, 2025
@wili-65535 wili-65535 force-pushed the user/wili/spec-configs branch from d5115b5 to f90c7ed Compare July 2, 2025 13:01
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jul 2, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jul 2, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jul 2, 2025
@wili-65535 wili-65535 force-pushed the user/wili/spec-configs branch from f90c7ed to 0fb10ba Compare July 3, 2025 08:21
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jul 3, 2025
@wili-65535 wili-65535 force-pushed the user/wili/spec-configs branch from 0fb10ba to b1e01ef Compare July 3, 2025 11:06
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jul 3, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jul 3, 2025
@wili-65535 wili-65535 force-pushed the user/wili/spec-configs branch from 9606876 to 2144c2e Compare July 3, 2025 15:26
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jul 3, 2025
@wili-65535 wili-65535 force-pushed the user/wili/spec-configs branch from 2144c2e to 5478d09 Compare July 4, 2025 00:23
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jul 4, 2025
@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jul 4, 2025
@wili-65535 wili-65535 force-pushed the user/wili/spec-configs branch 2 times, most recently from f732b13 to 9219d7f Compare July 4, 2025 03:13
@wili-65535 wili-65535 force-pushed the user/wili/spec-configs branch from 40293fe to b98113f Compare July 9, 2025 03:33
@wili-65535
Copy link
Collaborator Author

/bot run --disable-fail-fast

@NVIDIA NVIDIA deleted a comment from tensorrt-cicd Jul 9, 2025
@tensorrt-cicd
Copy link
Collaborator

PR_Github #11388 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11388 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #8421 completed with status: 'FAILURE'

@wili-65535 wili-65535 force-pushed the user/wili/spec-configs branch from b98113f to 7f3f013 Compare July 10, 2025 03:55
@wili-65535 wili-65535 requested a review from a team as a code owner July 10, 2025 03:55
@wili-65535 wili-65535 requested a review from suyoggupta July 10, 2025 03:55
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
@wili-65535 wili-65535 force-pushed the user/wili/spec-configs branch from 7f3f013 to 6064bb4 Compare July 10, 2025 03:57
@wili-65535
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11498 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11498 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8507 completed with status: 'FAILURE'

@wili-65535
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11533 [ run ] triggered by Bot

@wili-65535 wili-65535 requested a review from lfr-0531 July 10, 2025 11:17
@tensorrt-cicd
Copy link
Collaborator

tensorrt-cicd commented Jul 10, 2025

PR_Github #11533 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #8536 completed with status: 'SUCCESS'

wili-65535: Strange error, all tests passed but the pipeline failed at "Kill previous jobs".

@wili-65535
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11554 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11554 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8555 completed with status: 'SUCCESS'

@wili-65535
Copy link
Collaborator Author

wili-65535 commented Jul 10, 2025

Finally here we have a healthy commit with pipeline passed for this PR!!!
Ready for review now!

Copy link
Collaborator

@mikeiovine mikeiovine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, let's fix the thread leak stuff in a follow up.

Copy link
Member

@lucaslie lucaslie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving AutoDeploy-related changes

@mikeiovine mikeiovine merged commit 2e3cf42 into NVIDIA:main Jul 10, 2025
3 checks passed
zhou-yuxin pushed a commit to zhou-yuxin/TensorRT-LLM that referenced this pull request Jul 15, 2025
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
Signed-off-by: Yuxin <yuxinz@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants