I'm in a situation where I need to allow users to submit full HTML (e.g. using FCKeditor or TinyMCE) while at the same time maintaining nearly perfect protection against those pesky XSS attacks. Crazy, I know, but it just has to be done. Not my choice.
I want protection against all known attacks and most future 0-day attacks.
See
http://ha.ckers.org/xss.html
If I were using PHP, I'd use HTML Purifier.
See
http://htmlpurifier.org/
But I want to use Python (Django) for this app.
Can anyone point me towards a reliable Python library that (a) not only filters HTML tags and attributes but also checks the values and plain text between tags -- because some brain-dead browsers (read: Internet Explorer) will execute scripts in seemingly benign locations; (b) uses a well-audited whitelist in doing so; (c) doesn't crash on seriously malformed HTML; and (d) produces valid (X)HTML as output?
I've been doing some heavy searching, but came up with nothing except a few home-brewed solutions based on BeautifulSoup. Unfortunately, all of these only look at tags and attributes, and hence vulnerable to more sophisticated tricks targeted at specific browsers. Even more unfortunately, a large percentage of internet users regularly use those brain-dead browsers.
There's also this:
https://launchpad.net/python-html-sanitizer which sounds promising but it doesn't seem to have gone through any serious testing on the field.
C'mon... if PHP can do it, Python should be able to do it better...
Thanks in advance for any suggestions.