Skip to content

feat(eagle3):support qwen3 dense model #5879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 18, 2025

Conversation

xq25478
Copy link
Contributor

@xq25478 xq25478 commented Jul 9, 2025

feat(eagle3):support qwen3 dense model in eagle_one_model=False

Summary by CodeRabbit

  • New Features

    • Added support for capturing hidden states and residuals during model inference for enhanced analysis.
    • Introduced a new accuracy entry for the Qwen3-8B model using the Eagle decoding algorithm.
    • Added a new integration test to evaluate the Qwen3-8B model with Eagle3 speculative decoding.
  • Tests

    • Included the new Eagle3 decoding test in the pre-merge test suite for H100 GPU systems.

@xq25478 xq25478 requested a review from a team as a code owner July 9, 2025 10:57
@xq25478 xq25478 requested review from lucaslie and byshiue July 9, 2025 10:57
@xq25478 xq25478 force-pushed the support_qwen3_dense_eagle3 branch from ced83ff to 99adbc4 Compare July 9, 2025 11:06
@svc-trtllm-gh-bot svc-trtllm-gh-bot added the Community want to contribute PRs initiated from Community label Jul 9, 2025
@StudyingShao
Copy link
Collaborator

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11435 [ run ] triggered by Bot

@juney-nvidia juney-nvidia requested review from mikeiovine and removed request for lucaslie July 9, 2025 13:23
@tensorrt-cicd
Copy link
Collaborator

PR_Github #11435 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8457 completed with status: 'FAILURE'

@juney-nvidia
Copy link
Collaborator

@mikeiovine for vis about this Eagle-3 enablement for Qwen3 dense model.

@byshiue
Copy link
Collaborator

byshiue commented Jul 10, 2025

@xq25478 Can you help to add accuracy/test_llm_api_pytorch.py::TestQwen3_8B::test_eagle3 at https://github.com/xq25478/TensorRT-LLM/blob/99adbc4ad486e6606f85ea74af2e1ff1718044f1/tests/integration/test_lists/test-db/l0_h100.yml#L39 to enable your unit test?

@xq25478
Copy link
Contributor Author

xq25478 commented Jul 10, 2025

/bot run

1 similar comment
@byshiue
Copy link
Collaborator

byshiue commented Jul 10, 2025

/bot run

@byshiue
Copy link
Collaborator

byshiue commented Jul 10, 2025

@xq25478 You miss the sign-off on the second commit and the DCO fails. Can you help to fix it?

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11489 [ run ] triggered by Bot

@xq25478 xq25478 force-pushed the support_qwen3_dense_eagle3 branch from 67411bd to e592199 Compare July 10, 2025 02:42
@xq25478
Copy link
Contributor Author

xq25478 commented Jul 10, 2025

@xq25478 You miss the sign-off on the second commit and the DCO fails. Can you help to fix it?

fixed! merge into one commit.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11489 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8501 completed with status: 'FAILURE'

@xq25478 xq25478 force-pushed the support_qwen3_dense_eagle3 branch from e592199 to 1ee2c1c Compare July 10, 2025 08:00
@xq25478
Copy link
Contributor Author

xq25478 commented Jul 10, 2025

/bot run

1 similar comment
@byshiue
Copy link
Collaborator

byshiue commented Jul 10, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11531 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11531 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8534 completed with status: 'FAILURE'

@byshiue
Copy link
Collaborator

byshiue commented Jul 11, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11587 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11587 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8581 completed with status: 'FAILURE'

@xq25478 xq25478 force-pushed the support_qwen3_dense_eagle3 branch from 1ee2c1c to e73e864 Compare July 11, 2025 06:32
@byshiue
Copy link
Collaborator

byshiue commented Jul 16, 2025

The API is changed in latest codes https://github.com/xq25478/TensorRT-LLM/blob/support_qwen3_dense_eagle3/tensorrt_llm/llmapi/llm_args.py#L337C17-L337C38

You can run

LLM_MODELS_ROOT=/tmp/Qwen3/ pytest -s tests/integration/defs/accuracy/test_llm_api_pytorch.py::TestQwen3_8B::test_eagle3

to verify.

@xq25478
Copy link
Contributor Author

xq25478 commented Jul 16, 2025

The API is changed in latest codes https://github.com/xq25478/TensorRT-LLM/blob/support_qwen3_dense_eagle3/tensorrt_llm/llmapi/llm_args.py#L337C17-L337C38

You can run

LLM_MODELS_ROOT=/tmp/Qwen3/ pytest -s tests/integration/defs/accuracy/test_llm_api_pytorch.py::TestQwen3_8B::test_eagle3

to verify.

thank you,fixed.

Signed-off-by: xq25478 <xq25478@qq.com>
@xq25478 xq25478 force-pushed the support_qwen3_dense_eagle3 branch from a4b7c0d to e9ebd36 Compare July 16, 2025 08:58
@byshiue
Copy link
Collaborator

byshiue commented Jul 16, 2025

/bot run

@byshiue byshiue enabled auto-merge (squash) July 16, 2025 13:00
@tensorrt-cicd
Copy link
Collaborator

PR_Github #12088 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12088 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #8979 completed with status: 'FAILURE'

@byshiue
Copy link
Collaborator

byshiue commented Jul 17, 2025

@xq25478 It encounters error when running pre-commit run -a before the CI. Can you help to fix it?


[2025-07-16T13:21:15.421Z] isort....................................................................Passed

[2025-07-16T13:21:15.421Z] CRLF end-lines remover...................................................Passed

[2025-07-16T13:21:15.421Z] yapf.....................................................................Failed

[2025-07-16T13:21:15.421Z] - hook id: yapf

[2025-07-16T13:21:15.421Z] - files were modified by this hook

[2025-07-16T13:21:15.421Z] check for added large files..............................................Passed

[2025-07-16T13:21:15.421Z] check for merge conflicts................................................Passed

[2025-07-16T13:21:15.421Z] check for broken symlinks............................(no files to check)Skipped

[2025-07-16T13:21:15.421Z] detect private key.......................................................Passed

[2025-07-16T13:21:15.421Z] fix end of files.........................................................Passed

[2025-07-16T13:21:15.421Z] check yaml...............................................................Passed

[2025-07-16T13:21:15.421Z] trim trailing whitespace.................................................Passed

[2025-07-16T13:21:15.421Z] check toml...............................................................Passed

[2025-07-16T13:21:15.421Z] mixed line ending........................................................Passed

[2025-07-16T13:21:15.421Z] debug statements (python)................................................Passed

[2025-07-16T13:21:15.421Z] check json...........................................(no files to check)Skipped

[2025-07-16T13:21:15.421Z] autoflake................................................................Passed

[2025-07-16T13:21:15.421Z] clang-format.............................................................Passed

[2025-07-16T13:21:15.421Z] cmake-format.............................................................Passed

[2025-07-16T13:21:15.421Z] codespell................................................................Passed

[2025-07-16T13:21:15.421Z] ruff.....................................................................Passed

[2025-07-16T13:21:15.421Z] ruff-format..............................................................Passed

[2025-07-16T13:21:15.421Z] mdformat.................................................................Passed

[2025-07-16T13:21:15.421Z] pre-commit hook(s) made changes.

[2025-07-16T13:21:15.421Z] If you are seeing this message in CI, reproduce locally with: `pre-commit run --all-files`.

[2025-07-16T13:21:15.421Z] To run `pre-commit` as part of git workflow, use `pre-commit install`.

[2025-07-16T13:21:15.421Z] All changes made by hooks:

[2025-07-16T13:21:15.421Z] diff --git a/tests/integration/defs/accuracy/test_llm_api_pytorch.py b/tests/integration/defs/accuracy/test_llm_api_pytorch.py

[2025-07-16T13:21:15.421Z] index 030fd6a..6777386 100644

[2025-07-16T13:21:15.421Z] --- a/tests/integration/defs/accuracy/test_llm_api_pytorch.py

[2025-07-16T13:21:15.421Z] +++ b/tests/integration/defs/accuracy/test_llm_api_pytorch.py

[2025-07-16T13:21:15.421Z] @@ -1619,8 +1619,7 @@ class TestQwen3_8B(LlmapiAccuracyTestHarness):

[2025-07-16T13:21:15.421Z]  

[2025-07-16T13:21:15.421Z]          draft_len = 4

[2025-07-16T13:21:15.421Z]          spec_config = EagleDecodingConfig(max_draft_len=draft_len,

[2025-07-16T13:21:15.421Z] -                                          speculative_model_dir=eagle_model_dir

[2025-07-16T13:21:15.421Z] -                                        )

[2025-07-16T13:21:15.421Z] +                                          speculative_model_dir=eagle_model_dir)

[2025-07-16T13:21:15.421Z]  

[2025-07-16T13:21:15.421Z]          llm = LLM(model=target_model_dir,

[2025-07-16T13:21:15.421Z]                    **pytorch_config,

[2025-07-16T13:21:15.421Z] 

[2025-07-16T13:21:15.421Z] 

[2025-07-16T13:21:15.421Z] Error: pre-commit checks failed

Signed-off-by: xq25478 <xq25478@qq.com>
auto-merge was automatically disabled July 18, 2025 03:43

Head branch was pushed to by a user without write access

Copy link
Contributor

coderabbitai bot commented Jul 18, 2025

Walkthrough

The updates introduce speculative decoding support for the Qwen3 model by modifying model classes to accept and propagate spec_metadata, switching the causal LM class to inherit from a new base, and simplifying its constructor. Tests and reference files are updated to add and validate the Eagle speculative decoding algorithm for Qwen3-8B, including a new integration test.

Changes

File(s) Change Summary
tensorrt_llm/_torch/models/modeling_qwen3.py Updated imports, added spec_metadata to forward methods, changed base class for causal LM, simplified constructor, and removed custom forward.
tests/integration/defs/accuracy/references/mmlu.yaml Added Eagle speculative decoding accuracy entry for Qwen3-8B.
tests/integration/defs/accuracy/test_llm_api_pytorch.py Added test_eagle3 method to test Eagle speculative decoding for Qwen3-8B.
tests/integration/test_lists/test-db/l0_h100.yml Added the new Eagle3 test to the l0_h100 test group for pre-merge PyTorch testing.

Sequence Diagram(s)

sequenceDiagram
    participant Test as TestQwen3_8B.test_eagle3
    participant LLM as LLM (PyTorch)
    participant SpecModel as Eagle3 Model
    participant TargetModel as Qwen3-8B Model
    participant MMLU as MMLU Evaluator

    Test->>LLM: Instantiate with Eagle speculative config\n(spec_model=Eagle3, target_model=Qwen3-8B)
    LLM->>SpecModel: Load speculative model
    LLM->>TargetModel: Load target model
    Test->>LLM: Context enter (with LLM)
    Test->>MMLU: Evaluate LLM on MMLU
    MMLU->>LLM: Query for predictions
    LLM->>MMLU: Return answers (using Eagle speculative decoding)
    Test->>LLM: Context exit (cleanup)
Loading

Poem

A rabbit hopped to Eagle’s call,
Speculative dreams for Qwen3 enthrall.
Hidden states now captured neat,
New tests and configs—what a feat!
With YAMLs and code, we leap ahead,
On H100, our tests are led.
🦅✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e42f5a9 and eafdeaf.

📒 Files selected for processing (4)
  • tensorrt_llm/_torch/models/modeling_qwen3.py (5 hunks)
  • tests/integration/defs/accuracy/references/mmlu.yaml (1 hunks)
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py (1 hunks)
  • tests/integration/test_lists/test-db/l0_h100.yml (1 hunks)
🔇 Additional comments (4)
tests/integration/defs/accuracy/references/mmlu.yaml (1)

148-149: LGTM!

The new Eagle speculative decoding accuracy entry for Qwen3-8B is properly formatted and consistent with the existing entries.

tests/integration/test_lists/test-db/l0_h100.yml (1)

41-41: LGTM!

The new test entry for Eagle3 on Qwen3-8B is correctly added to the PyTorch pre-merge test suite.

tests/integration/defs/accuracy/test_llm_api_pytorch.py (1)

1610-1633: LGTM!

The Eagle3 test implementation follows the established patterns and correctly configures the speculative decoding parameters.

tensorrt_llm/_torch/models/modeling_qwen3.py (1)

19-21: LGTM!

The changes correctly integrate speculative decoding support into the Qwen3 model by:

  • Adding appropriate imports for SpecMetadata and SpecDecOneEngineForCausalLM
  • Properly propagating spec_metadata through the model layers
  • Correctly capturing hidden states for speculative decoding
  • Simplifying the Qwen3ForCausalLM class to inherit from the speculative decoding base class

Also applies to: 152-152, 176-179, 216-216, 237-237, 245-245, 247-254

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@xq25478
Copy link
Contributor Author

xq25478 commented Jul 18, 2025

@xq25478 It encounters error when running pre-commit run -a before the CI. Can you help to fix it?


[2025-07-16T13:21:15.421Z] isort....................................................................Passed

[2025-07-16T13:21:15.421Z] CRLF end-lines remover...................................................Passed

[2025-07-16T13:21:15.421Z] yapf.....................................................................Failed

[2025-07-16T13:21:15.421Z] - hook id: yapf

[2025-07-16T13:21:15.421Z] - files were modified by this hook

[2025-07-16T13:21:15.421Z] check for added large files..............................................Passed

[2025-07-16T13:21:15.421Z] check for merge conflicts................................................Passed

[2025-07-16T13:21:15.421Z] check for broken symlinks............................(no files to check)Skipped

[2025-07-16T13:21:15.421Z] detect private key.......................................................Passed

[2025-07-16T13:21:15.421Z] fix end of files.........................................................Passed

[2025-07-16T13:21:15.421Z] check yaml...............................................................Passed

[2025-07-16T13:21:15.421Z] trim trailing whitespace.................................................Passed

[2025-07-16T13:21:15.421Z] check toml...............................................................Passed

[2025-07-16T13:21:15.421Z] mixed line ending........................................................Passed

[2025-07-16T13:21:15.421Z] debug statements (python)................................................Passed

[2025-07-16T13:21:15.421Z] check json...........................................(no files to check)Skipped

[2025-07-16T13:21:15.421Z] autoflake................................................................Passed

[2025-07-16T13:21:15.421Z] clang-format.............................................................Passed

[2025-07-16T13:21:15.421Z] cmake-format.............................................................Passed

[2025-07-16T13:21:15.421Z] codespell................................................................Passed

[2025-07-16T13:21:15.421Z] ruff.....................................................................Passed

[2025-07-16T13:21:15.421Z] ruff-format..............................................................Passed

[2025-07-16T13:21:15.421Z] mdformat.................................................................Passed

[2025-07-16T13:21:15.421Z] pre-commit hook(s) made changes.

[2025-07-16T13:21:15.421Z] If you are seeing this message in CI, reproduce locally with: `pre-commit run --all-files`.

[2025-07-16T13:21:15.421Z] To run `pre-commit` as part of git workflow, use `pre-commit install`.

[2025-07-16T13:21:15.421Z] All changes made by hooks:

[2025-07-16T13:21:15.421Z] diff --git a/tests/integration/defs/accuracy/test_llm_api_pytorch.py b/tests/integration/defs/accuracy/test_llm_api_pytorch.py

[2025-07-16T13:21:15.421Z] index 030fd6a..6777386 100644

[2025-07-16T13:21:15.421Z] --- a/tests/integration/defs/accuracy/test_llm_api_pytorch.py

[2025-07-16T13:21:15.421Z] +++ b/tests/integration/defs/accuracy/test_llm_api_pytorch.py

[2025-07-16T13:21:15.421Z] @@ -1619,8 +1619,7 @@ class TestQwen3_8B(LlmapiAccuracyTestHarness):

[2025-07-16T13:21:15.421Z]  

[2025-07-16T13:21:15.421Z]          draft_len = 4

[2025-07-16T13:21:15.421Z]          spec_config = EagleDecodingConfig(max_draft_len=draft_len,

[2025-07-16T13:21:15.421Z] -                                          speculative_model_dir=eagle_model_dir

[2025-07-16T13:21:15.421Z] -                                        )

[2025-07-16T13:21:15.421Z] +                                          speculative_model_dir=eagle_model_dir)

[2025-07-16T13:21:15.421Z]  

[2025-07-16T13:21:15.421Z]          llm = LLM(model=target_model_dir,

[2025-07-16T13:21:15.421Z]                    **pytorch_config,

[2025-07-16T13:21:15.421Z] 

[2025-07-16T13:21:15.421Z] 

[2025-07-16T13:21:15.421Z] Error: pre-commit checks failed

fixed

@xq25478
Copy link
Contributor Author

xq25478 commented Jul 18, 2025

/bot run

1 similar comment
@byshiue
Copy link
Collaborator

byshiue commented Jul 18, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12279 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12279 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9118 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@byshiue byshiue merged commit 28858c8 into NVIDIA:main Jul 18, 2025
3 checks passed
@byshiue
Copy link
Collaborator

byshiue commented Jul 18, 2025

@xq25478 Thank you for the contribution and patient to handle the issue of CI. This PR is merged.

@xq25478 xq25478 deleted the support_qwen3_dense_eagle3 branch July 21, 2025 02:42
reasonsolo pushed a commit to reasonsolo/TensorRT-LLM that referenced this pull request Jul 21, 2025
Signed-off-by: xq25478 <xq25478@qq.com>
timlee0212 pushed a commit to timlee0212/TensorRT-LLM that referenced this pull request Jul 21, 2025
Signed-off-by: xq25478 <xq25478@qq.com>
NVShreyas pushed a commit to NVShreyas/TensorRT-LLM that referenced this pull request Jul 28, 2025
Signed-off-by: xq25478 <xq25478@qq.com>
Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community want to contribute PRs initiated from Community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants