-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[refactor] Simplification of Speculative decoding configs #5639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ad5069d
to
d8af844
Compare
d8af844
to
79a75fb
Compare
3081c65
to
d5115b5
Compare
d5115b5
to
f90c7ed
Compare
f90c7ed
to
0fb10ba
Compare
0fb10ba
to
b1e01ef
Compare
9606876
to
2144c2e
Compare
2144c2e
to
5478d09
Compare
f732b13
to
9219d7f
Compare
40293fe
to
b98113f
Compare
/bot run --disable-fail-fast |
PR_Github #11388 [ run ] triggered by Bot |
PR_Github #11388 [ run ] completed with state |
b98113f
to
7f3f013
Compare
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
7f3f013
to
6064bb4
Compare
/bot run --disable-fail-fast |
PR_Github #11498 [ run ] triggered by Bot |
PR_Github #11498 [ run ] completed with state |
/bot run --disable-fail-fast |
PR_Github #11533 [ run ] triggered by Bot |
PR_Github #11533 [ run ] completed with state wili-65535: Strange error, all tests passed but the pipeline failed at "Kill previous jobs". |
/bot run |
PR_Github #11554 [ run ] triggered by Bot |
PR_Github #11554 [ run ] completed with state |
Finally here we have a healthy commit with pipeline passed for this PR!!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, let's fix the thread leak stuff in a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving AutoDeploy-related changes
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com> Signed-off-by: Yuxin <yuxinz@nvidia.com>
Description
An update of PR5296.
We met a weird error in thread-leak of pytest in the old PR (https://nvidia.slack.com/archives/C059LSY62BT/p1751905741648539).
class SpecConfig
directly, the error would be raised stablely.Replace
class SpecConfig
oftensorrt_llm/_torch/speculatice/interface.py
intoclass DecodingBaseConfig
oftensorrt_llm/llmapi/llm_args.py
.class SpecConfig
such asDraftTargetConfig
,NGramConfig
, etc..class DecodingBaseConfig
for related usage.Adopt Fanrong's suggestions for the old PR.
Unify
pytorch_weights_path
orspeculative_model
intospeculative_model_dir
in many places (to align withmodel_dir
).Unify
max_draft_tokens
intomax_draft_len
in many places.Replace
prompt_lookup_num_tokens
in NGram pytorch workflow intomax_draft_len
.prompt_lookup_num_tokens
is for C++ code,examples/draft_target_model
and related tests, we will remove it in a later PR.Rewrite speculative decoding tests in
tests/unittest/_torch/speculative/
to use a unified code style.A small difference than the old PR:
update_from_model_config
,get_draft_model_prompt
, andget_num_extra_kv_tokens
as class methods rather than stand-alone tool functions, since I find the errors raise again when I split them out of the class, we may fix later.Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...
Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]
to print this help message.See details below for each supported subcommand.
run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]
Launch build/test pipelines. All previously running jobs will be killed.
--disable-fail-fast
(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test
(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-1, xxx"
(OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"
(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--only-multi-gpu-test
(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test
(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test
(OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.--post-merge
(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-[Post-Merge]-1, xxx"
(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.md
.kill
kill
Kill all running builds associated with pull request.
skip
skip --comment COMMENT
Skip testing for latest commit on pull request.
--comment "Reason for skipping build/test"
is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.