Skip to content

Index out of bounds panic when using shortest_match_at, is_match_at with end-anchored regex #969

@wabain

Description

@wabain

What version of regex are you using?

1.7.2, 1.7.1

Describe the bug at a high level.

Regex::shortest_match_at panics for certain regex patterns when given seemingly valid inputs. From the stack trace it looks like it might be specific to patterns that generate reversed FSMs.

What are the steps to reproduce the behavior?

(playground)

fn main() {
    let re = regex::Regex::new(r"c.*d\z").unwrap();
    println!("{:?}", re.shortest_match_at("ababcd", 4));
}

The same backtrace occurs if shortest_match_at is replaced with is_match_at.

What is the actual behavior?

Compiling playground v0.0.1 (/playground)
    Finished dev [unoptimized + debuginfo] target(s) in 0.96s
     Running `target/debug/playground`
thread 'main' panicked at 'index out of bounds: the len is 2 but the index is 6', /playground/.cargo/registry/src/github.com-1ecc6299db9ec823/regex-1.7.1/src/dfa.rs:1444:54
stack backtrace:
   0: rust_begin_unwind
             at /rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/panicking.rs:64:14
   2: core::panicking::panic_bounds_check
             at /rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/panicking.rs:159:5
   3: regex::dfa::Fsm::start_flags_reverse
             at ./.cargo/registry/src/github.com-1ecc6299db9ec823/regex-1.7.1/src/dfa.rs:1444:54
   4: regex::dfa::Fsm::reverse
             at ./.cargo/registry/src/github.com-1ecc6299db9ec823/regex-1.7.1/src/dfa.rs:495:42
   5: <regex::exec::ExecNoSync as regex::re_trait::RegularExpression>::shortest_match_at
             at ./.cargo/registry/src/github.com-1ecc6299db9ec823/regex-1.7.1/src/exec.rs:457:23
   6: <regex::exec::ExecNoSyncStr as regex::re_trait::RegularExpression>::shortest_match_at
             at ./.cargo/registry/src/github.com-1ecc6299db9ec823/regex-1.7.1/src/exec.rs:397:9
   7: regex::re_unicode::Regex::shortest_match_at
             at ./.cargo/registry/src/github.com-1ecc6299db9ec823/regex-1.7.1/src/re_unicode.rs:629:9
   8: playground::main
             at ./src/main.rs:3:22
   9: core::ops::function::FnOnce::call_once
             at /rustc/8460ca823e8367a30dda430efda790588b8c84d3/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

What is the expected behavior?

I believe the output should be Some(6) in this case. Since 4 is the index of a character boundary in the input text I think that it certainly shouldn't panic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions