URI: be more strict with www. URIs

We recognize URIs that start with an scheme and a possibly empty
authority, and URI suffixes that start with "www."

In the case of URIs starting with an scheme, they are of the form:

scheme://[ userinfo "@" ] host ...

while "www." URI suffixes are of the form:

www. <rest of host> ...

where host is actually in reg-name form (not in IPv4address or
IP-literal form).

This commit allows more strict parsing of e.g.

www.example.com:foo@bar.com

as <URI>:<email> instead of as a long <URI>.
wilder
Luis Javier Merino Morán 4 years ago committed by Tomaz Canabrava
parent 64fb6409c0
commit da41c19a86
  1. 3
      src/autotests/HotSpotFilterTest.cpp
  2. 4
      src/filterHotSpots/UrlFilter.cpp

@ -63,6 +63,9 @@ void HotSpotFilterTest::testUrlFilterRegex_data()
<< "http://example.com" << true;
QTest::newRow("empty_fragment") << "http://example.com/#"
<< "http://example.com" << true;
QTest::newRow("www_followed_by_colon") << "www.example.com:foo@bar.com"
<< "www.example.com" << true;
}
void HotSpotFilterTest::testUrlFilterRegex()

@ -37,7 +37,8 @@ using namespace Konsole;
// scheme://
// - Must start with an ASCII letter, preceeded by any non-word character,
// so "http" but not "mhttp"
static const char scheme_or_www[] = "(?<=^|[\\s\\[\\]()'\"])(?:www\\.|[a-z][a-z0-9+\\-.]*+://)";
static const char scheme_or_www[] = "(?<=^|[\\s\\[\\]()'\"])(?:www\\.|[a-z][a-z0-9+\\-.]*+://";
static const char scheme_or_www_end[] = ")";
// unreserved / pct-encoded / sub-delims
#define COMMON_1 "a-z0-9\\-._~%!$&'()*+,;="
@ -62,6 +63,7 @@ using LS1 = QLatin1String;
const QRegularExpression UrlFilter::FullUrlRegExp(
LS1(scheme_or_www)
+ LS1(userInfo)
+ LS1(scheme_or_www_end)
+ LS1(host)
+ LS1(port)
+ LS1(path)

Loading…
Cancel
Save