8000 [HtmlSanitizer] Introduce HtmlSanitizer component by tgalopin · Pull Request #44681 · symfony/symfony · GitHub
[go: up one dir, main page]

Skip to content

[HtmlSanitizer] Introduce HtmlSanitizer component #44681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 28, 2021

Conversation

tgalopin
Copy link
Contributor
@tgalopin tgalopin commented Dec 16, 2021
Q A
Branch? 6.1
Bug fix? no
New feature? yes
Deprecations? no
Tickets Fix #44144
License MIT
Doc PR -

Following discussion on #44144, this PR proposes to add a new HtmlSanitizer component to Symfony.

I refactored the infrastructure of my original html-sanitizer to better follow the W3C Standard Proposal about sanitizers.

The sanitizer can be used as follow:

use Symfony\Component\HtmlSanitizer\HtmlSanitizerConfig;
use Symfony\Component\HtmlSanitizer\HtmlSanitizer;

// By default, an element not added to the allowed or blocked elements
// will be dropped, including its children
$config = (new HtmlSanitizerConfig())
    // Allow "safe" elements and attributes. All scripts will be removed
    // as well as other dangerous behaviors like CSS injection
    ->allowSafeElements()

    // Allow all static elements and attributes from the W3C Sanitizer API
    // standard. All scripts will be removed but the output may still contain
    // other dangerous behaviors like CSS injection (click-jacking), CSS
    // expressions, ...
    ->allowAllStaticElements()

    // Allow the "div" element and no attribute can be on it
    ->allowElement('div')

    // Allow the "a" element, and the "title" attribute to be on it
    ->allowElement('a', ['title'])

    // Allow the "span" element, and any attribute from the Sanitizer API is allowed
    // (see https://wicg.github.io/sanitizer-api/#default-configuration)
    ->allowElement('span', '*')

    // Block the "section" element: this element will be removed but
    // its children will be retained
    ->blockElement('section')

    // Drop the "div" element: this element will be removed, including its children
    ->dropElement('div')

    // Allow the attribute "title" on the "div" element
    ->allowAttribute('title', ['div'])

    // Allow the attribute "data-custom-attr" on all currently allowed elements
    ->allowAttribute('data-custom-attr', '*')

    // Drop the "data-custom-attr" attribute from the "div" element:
    // this attribute will be removed
    ->dropAttribute('data-custom-attr', ['div'])

    // Drop the "data-custom-attr" attribute from all elements:
    // this attribute will be removed
    ->dropAttribute('data-custom-attr', '*')

    // Forcefully set the value of all "rel" attributes on "a"
    // elements to "noopener noreferrer"
    ->forceAttribute('a', 'rel', 'noopener noreferrer')

    // Transform all HTTP schemes to HTTPS
    ->forceHttpsUrls()

    // Configure which schemes are allowed in links (others will be dropped)
    ->allowedLinkSchemes(['https', 'http', 'mailto'])

    // Configure which hosts are allowed in links (by default all are allowed)
    ->allowedLinkHosts(['symfony.com', 'example.com'])

    // Allow relative URL in links (by default they are dropped)
    ->allowRelativeLinks()

    // Configure which schemes are allowed in img/audio/video/iframe (others will be dropped)
    ->allowedMediaSchemes(['https', 'http'])

    // Configure which hosts are allowed in img/audio/video/iframe (by default all are allowed)
    ->allowedMediaHosts(['symfony.com', 'example.com'])

    // Allow relative URL in img/audio/video/iframe (by default they are dropped)
    ->allowRelativeMedias()

    // Configure a custom attribute sanitizer to apply custom sanitization logic
    // ($attributeSanitizer instance of AttributeSanitizerInterface)
    ->withAttributeSanitizer($attributeSanitizer)

    // Unregister a previously registered attribute sanitizer
    // ($attributeSanitizer instance of AttributeSanitizerInterface)
    ->withoutAttributeSanitizer($attributeSanitizer)
;

$sanitizer = new HtmlSanitizer($config);

// Sanitize a given string, using the configuration provided, and in the
// "body" context (tags only allowed in <head> will be removed)
$sanitizer->sanitize($userInput);

// Sanitize the given string for a usage in a <head> tag
$sanitizer->sanitizeFor('head', $userInput);

// Sanitize the given string for a usage in another tag
$sanitizer->sanitizeFor('title', $userInput); // Will encode as HTML entities
$sanitizer->sanitizeFor('textarea', $userInput); // Will encode as HTML entities
$sanitizer->sanitizeFor('div', $userInput); // Will sanitize as body
$sanitizer->sanitizeFor('section', $userInput); // Will sanitize as body
// ...

The component is working and well tested.

I did some very basic performance testing and it seems to be **~ 2 times faster** than my original library, mostly by relying on much less objects and more on native primitives. My original library was already as fast as the fastest PHP HTML sanitizers, meaning this component may well be the fastest sanitizer in the PHP ecosystem (to better check of course!). That's exciting :) !

I still have on my TODO list:

  • Review and update the annotations in all classes and methods (especially to mark the component as experimental) ;
  • Add the FameworkBundle integration to use it in Symfony apps

Feel free to review the component while I work on finalizing it and the FameworkBundle integration!

@xorik
Copy link
xorik commented Dec 16, 2021

Hi.

First of all, thank you for you effort, this is really cool idea!

Second: I'm not a Symony team member, so feel free to ignore me.

My concern is that configuring will create a bunch of temporary objects. Maybe it's better to make a separate builder class, something like:

<?php

$config = (new HtmlSanitizerConfigBuilder())
    ->allowElement('div')
    ->allowElement('a', ['title'])
    ->getConfig()
;
$sanitizer = new HtmlSanitizer($config);

And call constructor only in the getConfig method, and not on every step, so it will be less CPU intensive. But maybe it's not required, if this will be somehow optimized in DI to a single constructor call.

@tgalopin
Copy link
Contributor Author

@xorik using a builder for the config could be an alternative indeed, but I think there is an advantage of using an immutable object and withers: the ability to store a "standard" configuration as a service and use it as a prototype to configure in your application. Using a builder would make this more complex while introducing a state in the service, which is IMO not a good idea.

Also, having a few more instantiated objects won't have any measurable performance impact, so I think we should stick to an immutable configuration object (immutability was proposed by @stof and @nicolas-grekas and I think that's a great idea).

@OskarStark I'm reviewing all your feedback (thanks for it!), I don't think we put final keywords on tests classes anywhere or did we? I feel like this is unnecessary as tests are excluded from the BC promise anyway.

Applying the other changes now :)

@tgalopin tgalopin force-pushed the html-sanitizer branch 2 times, most recently from bab4831 to 00d7e22 Compare December 17, 2021 10:53
Copy link
Member
@nicolas-grekas nicolas-grekas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some random notes :)

@tgalopin
Copy link
Contributor Author

I updated the PR for requested changes.

@nicolas-grekas
Copy link
Member

It'd be nice to be able to force an attribute on an element, eg rel=nofollow on <a>

Copy link
Contributor
@tigitz tigitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some missing return types

@tigitz
Copy link
Contributor
tigitz commented Dec 19, 2021

General question, is it planned to support only specific browser versions ?

I'm asking this because <BGSOUND SRC="javascript:alert('XSS');"> (found here) for example is not tested (so not supported I guess ?) but seems to be problematic only for IE AFAIU (https://developer.mozilla.org/fr/docs/Web/HTML/Element/bgsound).

So it could be legitimately ignored because it's an outdated browser but library consumers that still needs to support those browsers (and they exist) will probably require the package and won't realise it actually does not cover their needs by design.

If that's the plan I would recommend to be more upfront about the supported browsers and versions, whether it's phpdoc or doc itself.

@tgalopin
Copy link
Contributor Author

General question, is it planned to support only specific browser versions ?

I think the aim is to be as safe as possible, ie. ensuring we remove all sources of XSS for all browsers.

You raise an important question though: I'm the one who originally explored sources of potential XSS evasions, and I may have missed some. Your example is a good one: <BGSOUND SRC="javascript:alert('XSS');"> was indeed missing (because it wasn't listed on https://www.w3schools.com/tags/att_src.asp and I used this reference for the attribute sanitizer).

I'm going to explore https://cheatsheetseries.owasp.org/cheatsheets/XSS_Filter_Evasion_Cheat_Sheet.html in more details (thanks for the link!) but I think if people reviewing this PR could also have a look at potentially missed XSS evasion strategies, that's be great!

@tgalopin
Copy link
Contributor Author

@tigitz I added a test and a sanitization for your case FYI.

@nicolas-grekas I just added the ability to call ->forceAttribute('a', 'rel', 'noopener noreferrer') to force the value of an attribute on an element.

@stof

should we allow using * inside the array too, in case we want to allow all standard attributes plus some custom ones ?

I guess one could easily call allowAttribute right after? I think it would complexify a bit the code without much added benefit but I can do it if it's a deal breaker.

@tigitz
Copy link
Contributor
tigitz commented Dec 19, 2021

I think the aim is to be as safe as possible, ie. ensuring we remove all sources of XSS for all browsers.

Understood. I guess you evaluate the performance penalty of doing unnecessary "outdated browsers" sanitation checks in a "modern browsers" context to be marginal ?

if people reviewing this PR could also have a look at potentially missed XSS evasion strategies, that's be great!

Sure! For completeness, I found the given link through test suites of other open-source html sanitization libs. Could be useful to increase case coverage:

@tgalopin
Copy link
Contributor Author

I pushed more cases coming from https://cheatsheetseries.owasp.org/cheatsheets/XSS_Filter_Evasion_Cheat_Sheet.html#bgsound (thanks @tigitz !) and I refactored the default attribute sanitizer to check all elements for URL attributes ('src', 'href', 'lowsrc', 'background', 'ping'). It should be more reliable and safe than having a manual list.

Copy link
Member
@derrabus derrabus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's document list arrays a bit better.

@tgalopin
Copy link
Contributor Author

I fixed all reviews :)

@tgalopin
Copy link
Contributor Author

Updated

@tgalopin tgalopin force-pushed the html-sanitizer branch 2 times, most recently from f069b06 to 72381ee Compare December 23, 2021 21:43
Copy link
Member
@nicolas-grekas nicolas-grekas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the component. I guess you want to proceed with FWB integration in another PR?

@tgalopin
Copy link
Contributor Author

@nicolas-grekas I'm working on FWB integration but we could indeed open a new PR for that. I don't mind either way

@tgalopin
Copy link
Contributor Author

Let's open a different PR, it'll ease reviewing of the configuration/FW integration without having too much comments here.

@derrabus are you OK with the component now?

@tgalopin
Copy link
Contributor Author

Just opened #44798 for the FWB integration

Copy link
Member
@fabpot fabpot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge early to gather feedback.
Well done @tgalopin

@fabpot
Copy link
Member
fabpot commented Dec 28, 2021

Thank you @tgalopin.

@fabpot fabpot merged commit 424de23 into symfony:6.1 Dec 28, 2021
@tgalopin tgalopin deleted the html-sanitizer branch December 28, 2021 10:39
fabpot added a commit that referenced this pull request Apr 15, 2022
…t (tgalopin, wouterj)

This PR was merged into the 6.1 branch.

Discussion
----------

[FrameworkBundle] Integrate the HtmlSanitizer component

| Q             | A
| ------------- | ---
| Branch?       | 6.1
| Bug fix?      | no
| New feature?  | yes
| Deprecations? | no
| Tickets       | -
| License       | MIT
| Doc PR        | -

This PR adds the integration if the HtmlSanitizer component in the FrameworkBundle. See #44681 for details about the component.

The configuration for this integration is the following:

```yaml
framework:
    # This configuration is not required: as soon as you install the component, a default
    # "html_sanitizer" service is created, with the safe configuration, to be used directly.
    #
    # This configuration allows to set custom behaviors, in addition or instead of the default.
    html_sanitizer:
        # Default sanitizer (optional)
        # When not provided, the native "html_sanitizer" service is wired as default.
        default: my.sanitizer

        # Custom sanitizers (optional)
        sanitizers:

            # Each sanitizer defines its own service (no prefix/suffix) to ease understanding
            # Each sanitizer also defines a named autowiring alias to ease injection using variable name.

            # Here, this sanitizer is available as a service "my.sanitizer" or using autowiring
            # as "HtmlSanitizerInterface $mySanitizer".
            my.sanitizer:
                allow_safe_elements: true
                allow_elements:
                    iframe: ['src']
                    custom-tag: ['data-attr']
                    custom-tag-2: '*'
                block_elements:
                    - section
                drop_elements:
                    - video
                allow_attributes:
                    src: ['iframe']
                    data-attr: '*'
                drop_attributes:
                    data-attr: '*'
                force_attributes:
                    a:
                        rel: noopener noreferrer
                    h1:
                        class: bp4-heading
                force_https_urls: true
                allowed_link_schemes: ['http', 'https', 'mailto']
                allowed_link_hosts: ['symfony.com']
                allow_relative_links: true
                allowed_media_schemes: ['http', 'https', 'data']
                allowed_media_hosts: ['symfony.com']
                allow_relative_medias: true

            # "all.sanitizer" / "HtmlSanitizerInterface $allSanitizer"
            all.sanitizer:
                allow_all_static_elements: true
                allow_elements:
                    custom-tag: ['data-attr']

```

This PR is still WIP (esp tests) but I wanted to gather feedback regarding the configuration and DX as soon as possible.

Commits
-------

4dd3fd6 Finished XML config implementation
e0a9339 Integrate the HtmlSanitizer component to the FrameworkBundle
@fabpot fabpot mentioned this pull request Apr 15, 2022
@DocFX
Copy link
Contributor
DocFX commented Nov 30, 2023

So happy to see this one integrated in SF! Thanks for the job, you all rock. <3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] New component HtmlSanitizer
0