Imagus Sieve Configuration

This document proveds a guide to sieve configuration in Imagus. It is based on previous exchanges with the original developer of Imagus, with additions from the community.

Corrections and addtional information are always welcome!

Input Processing

Before it's sent to matching, every URL has its protocol and subdomain removed using the following regex:

/^https?:\/\/(?:www\.)?/

This is to allow the use of ^. If you don't start to with a protocol, it will be readded later. For example, the url https://some.url/thumbnail.jpg will be sent to the seive as some.url/thumbnail.jpg.

Fields

`link`

Finds hoverable elements by applying a match expression applied to the href attribute of all a (anchor) tags on the page.

`url`

Allows the sieve to specify an alternative URL to be parsed by the res configuration, instead of the value matched by the link or img field configurations. This only meaningful when the res field is also configured.

`res`

Loads and parses the current URL, generating the desired image URL.

While it's preferable to get the desired image URL using link or img configurations (since these are available on the current page), it's often necessary to parse the contents of a linked page to obtain the image URL.

In such cases, you can use the res field to load the URL specied by the link, img, or url in the background, and parse the raw contents to obtain the actual image URL.

The res configuration can be structured in several ways, detailed below.

Regex Configuration

If the res field is configured with a single regex pattern, its first match group will be interpreted as the full image URL.

If a custom caption is desired, it can be specified in one of two ways:

If the configured regex pattern contains additional match groups, they will be concatenated to form the caption.
If a second regex pattern is specified, its first match group will be interpreted as the caption.

JavaScript Configuration

For more complex parsing, res can be configured with a JavaScript block which will return image and caption information.

To specify a Javascript configuration, the configuration must begin with a colon on its own line. A special $ variable will be available in this context which holds the loaded content, and other helpful data.

Example

:
// Any groups captured by your `link` or `img` regex are added to `$` as subscripts.
// `$.length` tells you how many captures you have.
// `$[0]` is the full URL of the loaded page.
// `$[1]` is the first capture (and so on).
const firstCapture = $[1]
const lastCapture = $[($.length - 1)]
const ruleInfo = $.rule
const pageURL = $.url
const parentURL = $.base
const pageHTML = $._

// Parse `$._` and `return` as described below.
...

Return Values

Result	Caption	`return` Value
Image	Default	`[url]` or `url`
Image	Custom	`[url, caption]`
Gallery	Default	`[[urlA], [urlB]]`
Gallery	Custom	`[[urlA, captionA], [urlB, captionB]]`
Multi-Res Image	Default	`[[[url1x, #url2x]]]`
Multi-Res Image	Custom	`[[[url1x, #url2x], caption]]`
Multi-Res Gallery	Default	`[[[urlA1x, #urlA2x]], [[urlB1x, #urlB2x]]]`
Multi-Res Gallery	Custom	`[[[urlA1x, #urlA2x], captionA], [[urlB1x, #urlB2x], captionB]]`
Yellow Spinner	n/a	`false` or other "falsy" value.

`img`

Regex matching the src attribute of an IMG tag or style (eg. background-image: url(thumbnail)) for any elements.

`to`

to can have multiple lines, every one of them will be converted to a result URL. If # is the first character in a line then it marks the URL as hi-res.

If # is not the first then it may be followed by space separated strings closed by a # sign again. This will generate URLs for every variant. E.g. //some.url/path/full-image.#jpg png gif#, which will generate three URLs, testing them in order.

Also, at the end of the line you can add #{media_extension}, so the the extension will recognize it as video or audio, instead of image (default). E.g., https://some.url/path/without/extension?id=13#mp4

The :\n works in here as well (but the $ value there will have only the matched groups).

Javascript RegExp can be used with the builtin constructor:

RegExp('yourRegexInput')

The conversion happens with a simple URL.replace(yourRegex, rule.to).

`note`

Comments containing historical context intended to assist with future sieve development, e.g.:

Details about the sieve's creation and changes
Contributor information
Example case URLs,
Links to forum and Reddit posts with additional details

Options (Checkboxes)

Prioritize `img` over `link`

If the img and link fields are both configured, Imagus will prefer the link result by default. This option allows you to invert that behavior in that case, so that Imagus selects the img result.ed.

Case Insensitive

If this option is selected, the RegEx pattern in that field will be processed case-insensitively.

Loop Sieve with Result URL

It will recheck the result, e.g., a Google image link may point to a twitter thumbnail, and if you enable loop on Google images, then the resolved twitter thumbnail will be checked again, so it can get the larger image.

Decode Matched Result URL

If selected, Imagus will apply URL decoding to the result. This is needed for some sites (e.g., Bing, Yandex) which include the image source as a URLEncoded parameter within a larger URL.

Sources

Reddit Post

Taken from replies of this post.

First Reply

to can have multiple lines, every one of them will be converted to a result URL. If # is the first character in a line then it marks the URL as hi-res.

If # is not the first then it may be followed by space separated strings closed by a # sign again. This will generate URLs for every variant. E.g. //some.url/path/full-image.#jpg png gif#, which will generate three URLs, testing them in order.

Also, at the end of the line you can add #{media_extension}, so the the extension will recognize it as video or audio, instead of image (default). E.g., https://some.url/path/without/extension?id=13#mp4

Javascript RegExp is used (obviously). In the code it's simply RegExp('yourRegexInput'). The conversion happens with a simple URL.replace(yourRegex, rule.to).

res has different formats. If you simply write a regexp there, then the first group from the match will be the full image URL. If you have multiple groups, then starting with the second they will be concatenated to be used as caption. If you write a second regexp in a new line, then it's first group from the match will be the caption.

If res starts with :\n (so colon plus new line) then you can write JavaScript code there, and do whatever you want. A $ variable will be available in that piece of code, which holds the resolved content for example ($._). From res you can return null, this will make the spinner yellow. Other falsy values will hide the spinner. You can return a string, that will be the URL. An array with two members [URL, caption]. An array with multiple of the previous is an album.

The :\n works in to as well (but the $ value there will have only the matched groups).

From every URL, before it's sent to matching, the https?://(www\.)? part is removed (and added to the result URL later, if you don't start to with a protocol). So, you're basically matching some.url/thumbnail.jpg instead of https://some.url/thumbnail.jpg. The reason for this is to allow the use of ^ (which you already noticed).

Second Response

link matches the href attributes from a tags (so, something that is a page, and not a direct media file/thumbnail). img matches src from img tag (some something that is known to be an image file).

If res is used then link will be resolved. If you don't want to use link you can use the url parameter to modify the URL which will be resolved.

Possible (basic) combinations: (link or img) > to (both can be set at the same time and they would resolve to to, link with higher priority), (img or img) (> url)? > res, if link, res, img, and to is set at the same time, then link will be paired with res, and img with to.

The checkboxes beside the link and img have tooltips. The case-insensitiveness applies to the regex, the loop and decodeURI applies to the result URL.

Forum Post

Practically, regular expressions is the only knowledge you need to have (and a bit Javascript and HTML).

Parameters:

link : regex, it's working on "href" attribute of a link (A tag).

url : replacement, it has meaning only when "res" (next parameter) is set. Generates a URL for "res" if you need other, instead of link or image address.

res : regex match, if there is no way to get the larger image address with "link" or "img", this will load the page (if "url" parameter is not set, then the link or image address is used, otherwise the "url" replace) in background and will parse the content, and match the image URL.

img : regex, in for "src" attribute of an IMG tag or style="background-image: url(thumbnail)" for any elements.

to : [multiple]replacement or function, a link or image address will be replaced this based on "link" or "img".

note : is note.

Options:

Loop : it will recheck the result, e.g., a Google image link may point to a twitter thumbnail, and if you enable loop on Google images, then the resolved twitter thumbnail will be checked again, so it will get the larger image.

Decode URL : some providers, like Bing, Yandex... puts the encoded image address, as a parameter, into the URL.

Use img parameter : if both "link" or "res" and "img" are set, then "img" will be preferred.

Rules don't belong to specific sites, so all rule will be checked on any site when you hover your mouse on a link or thumbnail-like object. If no "link" parameter set/matched, and "img" is present, then "img" will be used on link address as well (this is when someone links to a thumbnail image), so no need to set the same value for "link" and "img", it's enough for the "img" (at least, this feature will be introduced) in v0.8.10).

macserv/index.md