User:Sun Creator/Avoid domains and URLs
- Home
Introduction and rules - User manual
How to use AWB - Discussion
Discuss AWB, report errors, and request features - User tasks
Request or help with AWB-able tasks - Technical
Technical documentation
Generic code
Add the following to any RegEx rule to generically avoid matching to correctly formed domains and URLs.
(?![^\s\.]*\.\w)(?<!\.[^\s\.]{0,999})
Tests
- Tests on List of Doctor Who universe creatures and aliens
- All times from the fastest on repeat testing
- On '\b' start and end of every word for high numerical occurrence
- \b(?![^\s\.]*\.\w)(?<!\.[^\s\.]*) 491ms
- \b(?<!\.[^\s\.]*)(?![^\s\.]*\.\w) 493ms //Somewhat unexpected given the results for 'a' words below.
- \b(?<!\.[^\s\.]*) 408ms
- \b(?![^\s\.]*\.\w) 385ms
- \b 319ms
- Word with 'a' on boundary
- \ba(?![^\s\.]*\.\w)(?<!\.[^\s\.]*) 24ms
- \ba(?<!\.[^\s\.]*)(?![^\s\.]*\.\w) 22ms
- \ba(?![^\s\.]*\.\w) 20ms
- \ba(?<!\.[^\s\.]*) 18ms
- \ba 17ms
- Realistic test, 'state' occurs six times in article during test.
- \bstate\b(?![^\s\.]*\.\w)(?<!\.[^\s\.]*) 2ms
- \bstate\b(?<!\.[^\s\.]*)(?![^\s\.]*\.\w) 2ms
- \bstate\b(?<!\.[^\s\.]*) 2ms
- \bstate\b(?![^\s\.]*\.\w) 2ms
- \bstate\b 2ms