vendor/regex-automata-0.2.0/src/hybrid/mod.rs - toolchain/rustc - Git at Google

 /*!
 A module for building and searching with lazy determinstic finite automata
 (DFAs).

 Like other modules in this crate, lazy DFAs support a rich regex syntax with
 Unicode features. The key feature of a lazy DFA is that it builds itself
 incrementally during search, and never uses more than a configured capacity of
 memory. Thus, when searching with a lazy DFA, one must supply a mutable "cache"
 in which the actual DFA's transition table is stored.

 If you're looking for fully compiled DFAs, then please see the top-level
 [`dfa` module](crate::dfa).

 # Overview

 This section gives a brief overview of the primary types in this module:

 * A [`regex::Regex`] provides a way to search for matches of a regular
 expression using lazy DFAs. This includes iterating over matches with both the
 start and end positions of each match.
 * A [`dfa::DFA`] provides direct low level access to a lazy DFA.

 # Example: basic regex searching

 This example shows how to compile a regex using the default configuration
 and then use it to find matches in a byte string:

 ```
 use regex_automata::{hybrid::regex::Regex, MultiMatch};

 let re = Regex::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
 let mut cache = re.create_cache();

 let text = b"2018-12-24 2016-10-08";
 let matches: Vec<MultiMatch> =
     re.find_leftmost_iter(&mut cache, text).collect();
 assert_eq!(matches, vec![
     MultiMatch::must(0, 0, 10),
     MultiMatch::must(0, 11, 21),
 ]);
 # Ok::<(), Box<dyn std::error::Error>>(())
 ```

 # Example: searching with regex sets

 The lazy DFAs in this module all fully support searching with multiple regexes
 simultaneously. You can use this support with standard leftmost-first style
 searching to find non-overlapping matches:

 ```
 use regex_automata::{hybrid::regex::Regex, MultiMatch};

 let re = Regex::new_many(&[r"\w+", r"\S+"])?;
 let mut cache = re.create_cache();

 let text = b"@foo bar";
 let matches: Vec<MultiMatch> =
     re.find_leftmost_iter(&mut cache, text).collect();
 assert_eq!(matches, vec![
     MultiMatch::must(1, 0, 4),
     MultiMatch::must(0, 5, 8),
 ]);
 # Ok::<(), Box<dyn std::error::Error>>(())
 ```

 Or use overlapping style searches to find all possible occurrences:

 ```
 use regex_automata::{hybrid::{dfa, regex::Regex}, MatchKind, MultiMatch};

 // N.B. For overlapping searches, we need the underlying lazy DFA to report all
 // possible matches.
 let re = Regex::builder()
     .dfa(dfa::Config::new().match_kind(MatchKind::All))
     .build_many(&[r"\w{3}", r"\S{3}"])?;
 let mut cache = re.create_cache();

 let text = b"@foo bar";
 let matches: Vec<MultiMatch> =
     re.find_overlapping_iter(&mut cache, text).collect();
 assert_eq!(matches, vec![
     MultiMatch::must(1, 0, 3),
     MultiMatch::must(0, 1, 4),
     MultiMatch::must(1, 1, 4),
     MultiMatch::must(0, 5, 8),
     MultiMatch::must(1, 5, 8),
 ]);
 # Ok::<(), Box<dyn std::error::Error>>(())
 ```

 # When should I use this?

 Generally speaking, if you can abide the use of mutable state during search,
 and you don't need things like capturing groups or Unicode word boundary
 support in non-ASCII text, then a lazy DFA is likely a robust choice with
 respect to both search speed and memory usage. Note however that its speed
 may be worse than a general purpose regex engine if you don't select a good
 [prefilter](crate::util::prefilter).

 If you know ahead of time that your pattern would result in a very large DFA
 if it was fully compiled, it may be better to use an NFA simulation instead
 of a lazy DFA. Either that, or increase the cache capacity of your lazy DFA
 to something that is big enough to hold the state machine (likely through
 experimentation). The issue here is that if the cache is too small, then it
 could wind up being reset too frequently and this might decrease searching
 speed significantly.

 # Differences with fully compiled DFAs

 A [`hybrid::regex::Regex`](crate::hybrid::regex::Regex) and a
 [`dfa::regex::Regex`](crate::dfa::regex::Regex) both have the same capabilities
 (and similarly for their underlying DFAs), but they achieve them through
 different means. The main difference is that a hybrid or "lazy" regex builds
 its DFA lazily during search, where as a fully compiled regex will build its
 DFA at construction time. While building a DFA at search time might sound like
 it's slow, it tends to work out where most bytes seen during a search will
 reuse pre-built parts of the DFA and thus can be almost as fast as a fully
 compiled DFA. The main downside is that searching requires mutable space to
 store the DFA, and, in the worst case, a search can result in a new state being
 created for each byte seen, which would make searching quite a bit slower.

 A fully compiled DFA never has to worry about searches being slower once
 it's built. (Aside from, say, the transition table being so large that it
 is subject to harsh CPU cache effects.) However, of course, building a full
 DFA can be quite time consuming and memory hungry. Particularly when it's
 so easy to build large DFAs when Unicode mode is enabled.

 A lazy DFA strikes a nice balance _in practice_, particularly in the
 presence of Unicode mode, by only building what is needed. It avoids the
 worst case exponential time complexity of DFA compilation by guaranteeing that
 it will only build at most one state per byte searched. While the worst
 case here can lead to a very high constant, it will never be exponential.

 # Syntax

 This module supports the same syntax as the `regex` crate, since they share the
 same parser. You can find an exhaustive list of supported syntax in the
 [documentation for the `regex` crate](https://docs.rs/regex/1/regex/#syntax).

 There are two things that are not supported by the lazy DFAs in this module:

 * Capturing groups. The DFAs (and [`Regex`](regex::Regex)es built on top
 of them) can only find the offsets of an entire match, but cannot resolve
 the offsets of each capturing group. This is because DFAs do not have the
 expressive power necessary.
 * Unicode word boundaries. These present particularly difficult challenges for
 DFA construction and would result in an explosion in the number of states.
 One can enable [`dfa::Config::unicode_word_boundary`] though, which provides
 heuristic support for Unicode word boundaries that only works on ASCII text.
 Otherwise, one can use `(?-u:\b)` for an ASCII word boundary, which will work
 on any input.

 There are no plans to lift either of these limitations.

 Note that these restrictions are identical to the restrictions on fully
 compiled DFAs.

 # Support for `alloc`-only

 This crate comes with `alloc` and `std` features that are enabled by default.
 One can disable the `std` feature and still use the full API of a lazy DFA.
 (You should use `std` when possible, since it permits providing implementations
 of the `std::error::Error` trait, and does enable some minor internal
 optimizations.)

 This module does require at least the `alloc` feature though. It is not
 available in any capacity without `alloc`.
 */

 pub use self::{
     error::{BuildError, CacheError},
     id::{LazyStateID, OverlappingState},
 };

 pub mod dfa;
 mod error;
 mod id;
 pub mod regex;
 mod search;
	/*!
	A module for building and searching with lazy determinstic finite automata
	(DFAs).

	Like other modules in this crate, lazy DFAs support a rich regex syntax with
	Unicode features. The key feature of a lazy DFA is that it builds itself
	incrementally during search, and never uses more than a configured capacity of
	memory. Thus, when searching with a lazy DFA, one must supply a mutable "cache"
	in which the actual DFA's transition table is stored.

	If you're looking for fully compiled DFAs, then please see the top-level
	[`dfa` module](crate::dfa).

	# Overview

	This section gives a brief overview of the primary types in this module:

	* A [`regex::Regex`] provides a way to search for matches of a regular
	expression using lazy DFAs. This includes iterating over matches with both the
	start and end positions of each match.
	* A [`dfa::DFA`] provides direct low level access to a lazy DFA.

	# Example: basic regex searching

	This example shows how to compile a regex using the default configuration
	and then use it to find matches in a byte string:

	```
	use regex_automata::{hybrid::regex::Regex, MultiMatch};

	let re = Regex::new(r"[0-9]{4}-[0-9]{2}-[0-9]{2}")?;
	let mut cache = re.create_cache();

	let text = b"2018-12-24 2016-10-08";
	let matches: Vec<MultiMatch> =
	re.find_leftmost_iter(&mut cache, text).collect();
	assert_eq!(matches, vec![
	MultiMatch::must(0, 0, 10),
	MultiMatch::must(0, 11, 21),
	]);
	# Ok::<(), Box<dyn std::error::Error>>(())
	```

	# Example: searching with regex sets

	The lazy DFAs in this module all fully support searching with multiple regexes
	simultaneously. You can use this support with standard leftmost-first style
	searching to find non-overlapping matches:

	```
	use regex_automata::{hybrid::regex::Regex, MultiMatch};

	let re = Regex::new_many(&[r"\w+", r"\S+"])?;
	let mut cache = re.create_cache();

	let text = b"@foo bar";
	let matches: Vec<MultiMatch> =
	re.find_leftmost_iter(&mut cache, text).collect();
	assert_eq!(matches, vec![
	MultiMatch::must(1, 0, 4),
	MultiMatch::must(0, 5, 8),
	]);
	# Ok::<(), Box<dyn std::error::Error>>(())
	```

	Or use overlapping style searches to find all possible occurrences:

	```
	use regex_automata::{hybrid::{dfa, regex::Regex}, MatchKind, MultiMatch};

	// N.B. For overlapping searches, we need the underlying lazy DFA to report all
	// possible matches.
	let re = Regex::builder()
	.dfa(dfa::Config::new().match_kind(MatchKind::All))
	.build_many(&[r"\w{3}", r"\S{3}"])?;
	let mut cache = re.create_cache();

	let text = b"@foo bar";
	let matches: Vec<MultiMatch> =
	re.find_overlapping_iter(&mut cache, text).collect();
	assert_eq!(matches, vec![
	MultiMatch::must(1, 0, 3),
	MultiMatch::must(0, 1, 4),
	MultiMatch::must(1, 1, 4),
	MultiMatch::must(0, 5, 8),
	MultiMatch::must(1, 5, 8),
	]);
	# Ok::<(), Box<dyn std::error::Error>>(())
	```

	# When should I use this?

	Generally speaking, if you can abide the use of mutable state during search,
	and you don't need things like capturing groups or Unicode word boundary
	support in non-ASCII text, then a lazy DFA is likely a robust choice with
	respect to both search speed and memory usage. Note however that its speed
	may be worse than a general purpose regex engine if you don't select a good
	[prefilter](crate::util::prefilter).

	If you know ahead of time that your pattern would result in a very large DFA
	if it was fully compiled, it may be better to use an NFA simulation instead
	of a lazy DFA. Either that, or increase the cache capacity of your lazy DFA
	to something that is big enough to hold the state machine (likely through
	experimentation). The issue here is that if the cache is too small, then it
	could wind up being reset too frequently and this might decrease searching
	speed significantly.

	# Differences with fully compiled DFAs

	A [`hybrid::regex::Regex`](crate::hybrid::regex::Regex) and a
	[`dfa::regex::Regex`](crate::dfa::regex::Regex) both have the same capabilities
	(and similarly for their underlying DFAs), but they achieve them through
	different means. The main difference is that a hybrid or "lazy" regex builds
	its DFA lazily during search, where as a fully compiled regex will build its
	DFA at construction time. While building a DFA at search time might sound like
	it's slow, it tends to work out where most bytes seen during a search will
	reuse pre-built parts of the DFA and thus can be almost as fast as a fully
	compiled DFA. The main downside is that searching requires mutable space to
	store the DFA, and, in the worst case, a search can result in a new state being
	created for each byte seen, which would make searching quite a bit slower.

	A fully compiled DFA never has to worry about searches being slower once
	it's built. (Aside from, say, the transition table being so large that it
	is subject to harsh CPU cache effects.) However, of course, building a full
	DFA can be quite time consuming and memory hungry. Particularly when it's
	so easy to build large DFAs when Unicode mode is enabled.

	A lazy DFA strikes a nice balance _in practice_, particularly in the
	presence of Unicode mode, by only building what is needed. It avoids the
	worst case exponential time complexity of DFA compilation by guaranteeing that
	it will only build at most one state per byte searched. While the worst
	case here can lead to a very high constant, it will never be exponential.

	# Syntax

	This module supports the same syntax as the `regex` crate, since they share the
	same parser. You can find an exhaustive list of supported syntax in the
	[documentation for the `regex` crate](https://docs.rs/regex/1/regex/#syntax).

	There are two things that are not supported by the lazy DFAs in this module:

	* Capturing groups. The DFAs (and [`Regex`](regex::Regex)es built on top
	of them) can only find the offsets of an entire match, but cannot resolve
	the offsets of each capturing group. This is because DFAs do not have the
	expressive power necessary.
	* Unicode word boundaries. These present particularly difficult challenges for
	DFA construction and would result in an explosion in the number of states.
	One can enable [`dfa::Config::unicode_word_boundary`] though, which provides
	heuristic support for Unicode word boundaries that only works on ASCII text.
	Otherwise, one can use `(?-u:\b)` for an ASCII word boundary, which will work
	on any input.

	There are no plans to lift either of these limitations.

	Note that these restrictions are identical to the restrictions on fully
	compiled DFAs.

	# Support for `alloc`-only

	This crate comes with `alloc` and `std` features that are enabled by default.
	One can disable the `std` feature and still use the full API of a lazy DFA.
	(You should use `std` when possible, since it permits providing implementations
	of the `std::error::Error` trait, and does enable some minor internal
	optimizations.)

	This module does require at least the `alloc` feature though. It is not
	available in any capacity without `alloc`.
	*/

	pub use self::{
	error::{BuildError, CacheError},
	id::{LazyStateID, OverlappingState},
	};

	pub mod dfa;
	mod error;
	mod id;
	pub mod regex;
	mod search;