Linode Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MembersMembers      Register Register 
 LoginLogin [ Anonymous ] 
Post new topic  Reply to topic
Author Message
PostPosted: Mon May 25, 2009 3:10 pm 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
I'm in a situation where I need to allow users to submit full HTML (e.g. using FCKeditor or TinyMCE) while at the same time maintaining nearly perfect protection against those pesky XSS attacks. Crazy, I know, but it just has to be done. Not my choice.

I want protection against all known attacks and most future 0-day attacks.
See http://ha.ckers.org/xss.html

If I were using PHP, I'd use HTML Purifier.
See http://htmlpurifier.org/

But I want to use Python (Django) for this app.

Can anyone point me towards a reliable Python library that (a) not only filters HTML tags and attributes but also checks the values and plain text between tags -- because some brain-dead browsers (read: Internet Explorer) will execute scripts in seemingly benign locations; (b) uses a well-audited whitelist in doing so; (c) doesn't crash on seriously malformed HTML; and (d) produces valid (X)HTML as output?

I've been doing some heavy searching, but came up with nothing except a few home-brewed solutions based on BeautifulSoup. Unfortunately, all of these only look at tags and attributes, and hence vulnerable to more sophisticated tricks targeted at specific browsers. Even more unfortunately, a large percentage of internet users regularly use those brain-dead browsers.

There's also this: https://launchpad.net/python-html-sanitizer which sounds promising but it doesn't seem to have gone through any serious testing on the field.

C'mon... if PHP can do it, Python should be able to do it better...

Thanks in advance for any suggestions.


Top
   
 Post subject:
PostPosted: Mon May 25, 2009 4:07 pm 
Offline
Senior Newbie

Joined: Mon May 25, 2009 3:44 pm
Posts: 5
Website: http://www.turleando.com.ar/
Location: Rosario, Argentina
Why don't you use HTML Purifier through a "filter application"? You could make a simple PHP script that receives HTML code as a commandline parameter, runs HTML Purifier on it, and then prints the clean HTML. Then, you could use os.system() or similar from python to invoke the script and clean the code.


Top
   
 Post subject:
PostPosted: Mon May 25, 2009 5:05 pm 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
turl wrote:
Why don't you use HTML Purifier through a "filter application"?


Sure, that's a possibility. Maybe even run PHP as a daemon (as Apache module, or using FastCGI) and communicate with it over standard HTTP for better concurrency.

Still, I'd prefer a pure Python solution if at all possible. I don't want to lug PHP around.


Top
   
 Post subject:
PostPosted: Tue May 26, 2009 7:29 am 
Offline
Senior Member
User avatar

Joined: Mon Dec 10, 2007 4:30 pm
Posts: 341
Website: http://markwalling.org
http://genshi.edgewall.org/ ?

I used this *once* on a project I never finished, but it seemed like the smartest way to go at the time.


Top
   
 Post subject:
PostPosted: Tue May 26, 2009 5:47 pm 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
mwalling wrote:
http://genshi.edgewall.org/ ?


Didn't know that Genshi had a HTML Sanitizer feature... But then, it seems rather poorly documented. Not sure if I can trust this one. Maybe I'll dig into the source code a little bit 8)


Top
   
 Post subject:
PostPosted: Wed May 27, 2009 2:10 am 
Offline
Senior Member

Joined: Wed May 13, 2009 1:18 am
Posts: 681
hybinet wrote:
Didn't know that Genshi had a HTML Sanitizer feature... But then, it seems rather poorly documented. Not sure if I can trust this one. Maybe I'll dig into the source code a little bit 8)

I actually thought Genshi documentation was quite fine, although perhaps earlier exposure to TAL and Kid already had me thinking in the tag/attribute markup mode. I still like the approach for templating.

But I don't think Genshi has anything like a sanitizer, unless you count the fact that its template parsing is strict XML. But I suspect you're not looking to try to parse the user supplied HTML as a Genshi template, nor would that likely complain about well-formed XSS attacks.

-- David


Top
   
 Post subject:
PostPosted: Wed May 27, 2009 3:50 am 
Offline
Senior Member

Joined: Fri May 02, 2008 8:44 pm
Posts: 1121
Genshi documentation is just fine. I was talking about the nonexistent documentation of the HTML Sanitizer feature mentioned above. Anyway, what it seems to do is to filter the tags and attributes. Not at all looking inside those tags.

I want to detect tricks like the following, which unfortunately works in IE6.
Code:
<img src="jav&#x0A;ascript:doNastyThings();">


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
RSS

Powered by phpBB® Forum Software © phpBB Group