Filter, Rewrite and Scraper Rules

Feed Filtering Rules

Noflux has a basic filtering system that allows you to ignore or keep articles.

Block Rules

Block rules ignore articles with a title, an entry URL, a tag or an author that match the regex (RE2 syntax).

For example, the regex (?i)noflux will ignore all articles with a title that contains the word Noflux (case insensitive).

Ignored articles won’t be saved into the database.

Keep Rules

Keep rules keep only articles that match the regex (RE2 syntax).

For example, the regex (?i)noflux will keep only the articles with a title that contains the word Noflux (case insensitive).

Global Filtering Rules

Global filters are defined on the Settings page and are automatically applied to all articles from all feeds.

Rule Format:

FieldName=RegEx
FieldName=RegEx
FieldName=RegEx

Available Fields:

Date Patterns

The EntryDate field supports the following date patterns:

Date format must be YYYY-MM-DD, for example: 2024-01-01

Block Rules

Block rules ignores articles that match a single rule.

For example, the rule EntryTitle=(?i)noflux will ignore all articles with a title that contains the word Noflux (case insensitive).

For example:

Keep Rules

Keep rules will keep articles that match a single rule.

For example, the rule EntryTitle=(?i)noflux will keep only the articles with a title that contains the word Noflux (case insensitive).

For example:

Global Rules & Feed Rules Ordering

Rules are processed in this order:

  1. Global Block Rules
  2. Feed Block Rule
  3. Global Keep Rules
  4. Feed Keep Rule

Rewrite Rules

To improve the reading experience, it’s possible to alter the content of feed items.

For example, if you are reading a popular comic website like XKCD, it’s nice to have the image title (the alt attribute) added under the image. Especially on mobile devices where there is no hover event.

add_dynamic_image
Tries to add the highest quality images from sites that use JavaScript to load images (e.g. either lazily when scrolling or based on screen size).
add_dynamic_iframe
Tries to add embedded videos from sites that use JavaScript to load iframes (e.g. either lazily when scrolling or after the rest of the page is loaded).
add_image_title
Add each image's title as a caption under the image.
add_youtube_video
Insert Youtube video to the article (automatic for Youtube.com).
add_youtube_video_from_id
Insert Youtube video to the article based on the video ID.
add_invidious_video
Insert Invidious player to the article (automatic for https://invidio.us).
add_youtube_video_using_invidious_player
Insert Invidious player to the article for Youtube feeds.
add_castopod_episode
Insert Castopod episode player.
add_mailto_subject
Insert mailto links subject into the article.
base64_decode
This rewrite rule decode base64 content. It can be used with a selector: base64_decode(".base64"), but can also be used without argument: base64_decode. In this case it'll try to convert all TextNodes and always fallback to original text if it can decode.
nl2br
Convert new lines \n to <br> (useful for non-HTML contents).
convert_text_links
Convert text link to HTML links (useful for non-HTML contents).
fix_medium_images
Attempt to fix Medium's images rendered in Javascript.
use_noscript_figure_images
Use <noscript> content for images rendered with Javascript.
replace("search term"|"replace term")
Search and replace text.
remove(".selector, #another_selector")
Remove DOM elements.
parse_markdown (Removed in v2.2.4)
Convert Markdown to HTML. This rule has been removed in version 2.2.4.
remove_tables
Remove any tables while keeping the content inside (useful for email newsletters).
remove_clickbait
Remove clickbait titles (Convert uppercase titles).
replace_title("search-term"|"replace-term")
Rewrite rule to adjust entry titles.
add_hn_links_using_hack
Open HN comments with Hack.
add_hn_links_using_opener
Open HN comments with Opener.

Noflux includes a set of predefined rules for some websites, but you could define your own rules.

On the feed edit page, enter your custom rules in the field “Rewrite Rules” like this:

rule1,rule2

Separate each rule by a comma.

Scraper Rules

When an article contains only an extract of the content, you could fetch the original web page and apply a set of rules to get relevant contents.

Noflux uses CSS selectors for custom rules. These custom rules can be saved in the feed properties (Select a feed and click on edit).

CSS SelectorDescription
div#articleBodyFetch a div element with the ID articleBody
div.contentFetch all div elements with the class content
article, div.articleUse a comma to define multiple rules

Noflux includes a list of predefined rules for popular websites. You could contribute to the project to keep them up to date.

Under the hood, Noflux uses the library Goquery.

URL Rewrite Rules

Sometimes it might be required to rewrite an URL in a feed to fetch better suited content.

For example, for some users the URL https://www.npr.org/sections/money/2021/05/18/997501946/the-case-for-universal-pre-k-just-got-stronger displays a cookie consent dialog instead of the actual content and it would be preferred to fetch the URL https://text.npr.org/997501946 instead.

The following rules does this:

rewrite("^https:\/\/www\.npr\.org\/\d{4}\/\d{2}\/\d{2}\/(\d+)\/.*$"|"https://text.npr.org/$1")

This will rewrite all URLs from the original feed to URLs pointing to text.npr.org when the article content is fetched. I also had to add my own scraper rule, because the default rule will try to fetch #storytext.

Another example is the german page https://www.heise.de/news/Industrie-ruestet-sich-fuer-Gasstopp-Forscher-vorsichtig-optimistisch-7167721.html which splits the article into multiple pages. The full text can be read on https://www.heise.de/news/Industrie-ruestet-sich-fuer-Gasstopp-Forscher-vorsichtig-optimistisch-7167721.html?seite=all

The URL rewrite rule for that would be

rewrite("(.*?\.html)"|"$1?seite=all")