This document proveds a guide to sieve configuration in Imagus. It is based on previous exchanges with the original developer of Imagus, with additions from the community.
Corrections and addtional information are always welcome!
Before it's sent to matching, every URL has its protocol and subdomain removed using the following regex:
/^https?:\/\/(?:www\.)?/
This is to allow the use of ^
. If you don't start to
with a protocol, it will be readded later. For example, the url https://some.url/thumbnail.jpg
will be sent to the seive as some.url/thumbnail.jpg
.
Finds hoverable elements by applying a match expression applied to the href
attribute of all a
(anchor) tags on the page.
Allows the sieve to specify an alternative URL to be parsed by the res
configuration, instead of the value matched by the link
or img
field configurations. This only meaningful when the res
field is also configured.
Loads and parses the current URL, generating the desired image URL.
While it's preferable to get the desired image URL using link
or img
configurations (since these are available on the current page), it's often necessary to parse the contents of a linked page to obtain the image URL.
In such cases, you can use the res
field to load the URL specied by the link
, img
, or url
in the background, and parse the raw contents to obtain the actual image URL.
The res
configuration can be structured in several ways, detailed below.
If the res
field is configured with a single regex pattern, its first match group will be interpreted as the full image URL.
If a custom caption is desired, it can be specified in one of two ways:
- If the configured regex pattern contains additional match groups, they will be concatenated to form the caption.
- If a second regex pattern is specified, its first match group will be interpreted as the caption.
For more complex parsing, res
can be configured with a JavaScript block which will return image and caption information.
To specify a Javascript configuration, the configuration must begin with a colon on its own line. A special $
variable will be available in this context which holds the loaded content, and other helpful data.
:
// Any groups captured by your `link` or `img` regex are added to `$` as subscripts.
// `$.length` tells you how many captures you have.
// `$[0]` is the full URL of the loaded page.
// `$[1]` is the first capture (and so on).
const firstCapture = $[1]
const lastCapture = $[($.length - 1)]
const ruleInfo = $.rule
const pageURL = $.url
const parentURL = $.base
const pageHTML = $._
// Parse `$._` and `return` as described below.
...
Result | Caption | return Value |
---|---|---|
Image | Default | [url] or url |
Custom | [url, caption] | |
Gallery | Default | [[urlA], [urlB]] |
Custom | [[urlA, captionA], [urlB, captionB]] | |
Multi-Res Image | Default | [[[url1x, #url2x]]] |
Custom | [[[url1x, #url2x], caption]] | |
Multi-Res Gallery | Default | [[[urlA1x, #urlA2x]], [[urlB1x, #urlB2x]]] |
Custom | [[[urlA1x, #urlA2x], captionA], [[urlB1x, #urlB2x], captionB]] | |
Yellow Spinner | n/a | false or other "falsy" value. |
Regex matching the src
attribute of an IMG
tag or style (eg. background-image: url(thumbnail)
) for any elements.
to
can have multiple lines, every one of them will be converted to a result URL.
If #
is the first character in a line then it marks the URL as hi-res.
If #
is not the first then it may be followed by space separated strings closed by a #
sign again. This will generate URLs for every variant. E.g. //some.url/path/full-image.#jpg png gif#
, which will generate three URLs, testing them in order.
Also, at the end of the line you can add #{media_extension}
, so the the extension will recognize it as video or audio, instead of image (default). E.g., https://some.url/path/without/extension?id=13#mp4
The :\n
works in here as well (but the $ value there will have only the matched groups).
Javascript RegExp can be used with the builtin constructor:
RegExp('yourRegexInput')
The conversion happens with a simple URL.replace(yourRegex, rule.to)
.
Comments containing historical context intended to assist with future sieve development, e.g.:
- Details about the sieve's creation and changes
- Contributor information
- Example case URLs,
- Links to forum and Reddit posts with additional details
If the img
and link
fields are both configured, Imagus will prefer the link
result by default. This option allows you to invert that behavior in that case, so that Imagus selects the img
result.ed.
If this option is selected, the RegEx pattern in that field will be processed case-insensitively.
It will recheck the result, e.g., a Google image link may point to a twitter thumbnail, and if you enable loop on Google images, then the resolved twitter thumbnail will be checked again, so it can get the larger image.
If selected, Imagus will apply URL decoding to the result. This is needed for some sites (e.g., Bing, Yandex) which include the image source as a URLEncoded parameter within a larger URL.
Taken from replies of this post.
to
can have multiple lines, every one of them will be converted to a result URL. If#
is the first character in a line then it marks the URL as hi-res.If
#
is not the first then it may be followed by space separated strings closed by a#
sign again. This will generate URLs for every variant. E.g.//some.url/path/full-image.#jpg png gif#
, which will generate three URLs, testing them in order.Also, at the end of the line you can add
#{media_extension}
, so the the extension will recognize it as video or audio, instead of image (default). E.g.,https://some.url/path/without/extension?id=13#mp4
Javascript RegExp is used (obviously). In the code it's simply
RegExp('yourRegexInput')
. The conversion happens with a simpleURL.replace(yourRegex, rule.to)
.
res
has different formats. If you simply write a regexp there, then the first group from the match will be the full image URL. If you have multiple groups, then starting with the second they will be concatenated to be used as caption. If you write a second regexp in a new line, then it's first group from the match will be the caption.If
res
starts with:\n
(so colon plus new line) then you can write JavaScript code there, and do whatever you want. A$
variable will be available in that piece of code, which holds the resolved content for example ($._
). Fromres
you can returnnull
, this will make the spinner yellow. Other falsy values will hide the spinner. You can return a string, that will be the URL. An array with two members [URL, caption]. An array with multiple of the previous is an album.The
:\n
works into
as well (but the $ value there will have only the matched groups).From every URL, before it's sent to matching, the
https?://(www\.)?
part is removed (and added to the result URL later, if you don't startto
with a protocol). So, you're basically matchingsome.url/thumbnail.jpg
instead ofhttps://some.url/thumbnail.jpg
. The reason for this is to allow the use of^
(which you already noticed).
link
matches thehref
attributes froma
tags (so, something that is a page, and not a direct media file/thumbnail).img
matchessrc
fromimg
tag (some something that is known to be an image file).If
res
is used thenlink
will be resolved. If you don't want to uselink
you can use theurl
parameter to modify the URL which will be resolved.Possible (basic) combinations: (
link
orimg
) >to
(both can be set at the same time and they would resolve toto
,link
with higher priority), (img
orimg
) (>url
)? >res
, iflink
,res
,img
, andto
is set at the same time, thenlink
will be paired withres
, andimg
withto
.The checkboxes beside the
link
andimg
have tooltips. The case-insensitiveness applies to the regex, the loop and decodeURI applies to the result URL.
Practically, regular expressions is the only knowledge you need to have (and a bit Javascript and HTML).
Parameters:
- link : regex, it's working on "href" attribute of a link (A tag).
- url : replacement, it has meaning only when "res" (next parameter) is set. Generates a URL for "res" if you need other, instead of link or image address.
- res : regex match, if there is no way to get the larger image address with "link" or "img", this will load the page (if "url" parameter is not set, then the link or image address is used, otherwise the "url" replace) in background and will parse the content, and match the image URL.
- img : regex, in for "src" attribute of an IMG tag or style="background-image: url(thumbnail)" for any elements.
- to : [multiple]replacement or function, a link or image address will be replaced this based on "link" or "img".
- note : is note.
Options:
- Loop : it will recheck the result, e.g., a Google image link may point to a twitter thumbnail, and if you enable loop on Google images, then the resolved twitter thumbnail will be checked again, so it will get the larger image.
- Decode URL : some providers, like Bing, Yandex... puts the encoded image address, as a parameter, into the URL.
- Use img parameter : if both "link" or "res" and "img" are set, then "img" will be preferred.
Rules don't belong to specific sites, so all rule will be checked on any site when you hover your mouse on a link or thumbnail-like object. If no "link" parameter set/matched, and "img" is present, then "img" will be used on link address as well (this is when someone links to a thumbnail image), so no need to set the same value for "link" and "img", it's enough for the "img" (at least, this feature will be introduced) in v0.8.10).