Skip to content

Instantly share code, notes, and snippets.

@mbijon
Last active November 1, 2022 03:23
Show Gist options
  • Save mbijon/1098477 to your computer and use it in GitHub Desktop.
Save mbijon/1098477 to your computer and use it in GitHub Desktop.
XSS filtering in PHP (cleans various UTF encodings & nested exploits)
<?php
/*
* XSS filter, recursively handles HTML tags & UTF encoding
* Optionally handles base64 encoding
*
* ***DEPRECATION RECOMMENDED*** Not updated or maintained since 2011
* A MAINTAINED & BETTER ALTERNATIVE => kses
* https://github.com/RichardVasquez/kses/
*
* This was built from numerous sources
* (thanks all, sorry I didn't track to credit you)
*
* It was tested against *most* exploits here: http://ha.ckers.org/xss.html
* WARNING: Some weren't tested!!!
* Those include the Actionscript and SSI samples, or any newer than Jan 2011
*
*/
class xssClean {
/*
* Recursive worker to strip risky elements
*
* @param string $input Content to be cleaned. It MAY be modified in output
* @return string $output Modified $input string
*/
public function clean_input( $input, $safe_level = 0 ) {
$output = $input;
do {
// Treat $input as buffer on each loop, faster than new var
$input = $output;
// Remove unwanted tags
$output = $this->strip_tags( $input );
$output = $this->strip_encoded_entities( $output );
// Use 2nd input param if not empty or '0'
if ( $safe_level !== 0 ) {
$output = $this->strip_base64( $output );
}
} while ( $output !== $input );
return $output;
}
/*
* Focuses on stripping encoded entities
* *** This appears to be why people use this sample code. Unclear how well Kses does this ***
*
* @param string $input Content to be cleaned. It MAY be modified in output
* @return string $input Modified $input string
*/
private function strip_encoded_entities( $input ) {
// Fix &entity\n;
$input = str_replace(array('&amp;','&lt;','&gt;'), array('&amp;amp;','&amp;lt;','&amp;gt;'), $input);
$input = preg_replace('/(&#*\w+)[\x00-\x20]+;/u', '$1;', $input);
$input = preg_replace('/(&#x*[0-9A-F]+);*/iu', '$1;', $input);
$input = html_entity_decode($input, ENT_COMPAT, 'UTF-8');
// Remove any attribute starting with "on" or xmlns
$input = preg_replace('#(<[^>]+?[\x00-\x20"\'])(?:on|xmlns)[^>]*+[>\b]?#iu', '$1>', $input);
// Remove javascript: and vbscript: protocols
$input = preg_replace('#([a-z]*)[\x00-\x20]*=[\x00-\x20]*([`\'"]*)[\x00-\x20]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2nojavascript...', $input);
$input = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2novbscript...', $input);
$input = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*-moz-binding[\x00-\x20]*:#u', '$1=$2nomozbinding...', $input);
// Only works in IE: <span style="width: expression(alert('Ping!'));"></span>
$input = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?expression[\x00-\x20]*\([^>]*+>#i', '$1>', $input);
$input = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?behaviour[\x00-\x20]*\([^>]*+>#i', '$1>', $input);
$input = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:*[^>]*+>#iu', '$1>', $input);
return $input;
}
/*
* Focuses on stripping unencoded HTML tags & namespaces
*
* @param string $input Content to be cleaned. It MAY be modified in output
* @return string $input Modified $input string
*/
private function strip_tags( $input ) {
// Remove tags
$input = preg_replace('#</*(?:applet|b(?:ase|gsound|link)|embed|frame(?:set)?|i(?:frame|layer)|l(?:ayer|ink)|meta|object|s(?:cript|tyle)|title|xml)[^>]*+>#i', '', $input);
// Remove namespaced elements
$input = preg_replace('#</*\w+:\w[^>]*+>#i', '', $input);
return $input;
}
/*
* Focuses on stripping entities from Base64 encoded strings
*
* NOT ENABLED by default!
* To enable 2nd param of clean_input() can be set to anything other than 0 or '0':
* ie: xssClean->clean_input( $input_string, 1 )
*
* @param string $input Maybe Base64 encoded string
* @return string $output Modified & re-encoded $input string
*/
private function strip_base64( $input ) {
$decoded = base64_decode( $input );
$decoded = $this->strip_tags( $decoded );
$decoded = $this->strip_encoded_entities( $decoded );
$output = base64_encode( $decoded );
return $output;
}
}
@isunilk
Copy link

isunilk commented Apr 14, 2014

Thanks for it man! it helped :)

@opya
Copy link

opya commented May 2, 2014

Thanks man, help me alot

@sitthykun
Copy link

Thank for your effort

@donovanjackson
Copy link

Does this also cover tags with uppercase text. I know that's not proper formatting for a tag, but what if someone is inserting data into their DB after turning it lowercase?

@Lofanmi
Copy link

Lofanmi commented Jan 2, 2015

Thanks for your effort:)

@soaj1664
Copy link

soaj1664 commented Feb 6, 2015

Here is the bypass, I think:

<img src =x onerror=confirm(document.cookie);

The regular expression (https://gist.github.com/mbijon/1098477#file-xss_clean-php-L26) expects that attacker will use the closing angular bracket which is missing in the above vector and all browsers will render this ...

@mrhsce
Copy link

mrhsce commented Mar 6, 2015

Thanks a lot my fellow developer
I think If you take a look at the anti xss library in the link
and compare it with yours you can improve your code even further
https://code.google.com/p/php-antixss/downloads/detail?name=AntiXSS.php&can=2&q=

@voku
Copy link

voku commented Jun 17, 2015

@shyandsy
Copy link

shyandsy commented Jul 2, 2015

thanks for a lot
helpful

@mbijon
Copy link
Author

mbijon commented Jul 22, 2015

@ALL -- FYI that I haven't updated this function since 2011 ...so that's likely one of MANY VULNS in this lib. Unless you have performance constraints I recommend HTML Purifier instead of this quick & dirty method.


It might be time to update or just pull this lib down.:

  • @soaj1664 Thanks for catching that <img src =x onerror=confirm(document.cookie);.
  • @voku I'll try running your anit-xss tester from bc003bf against this.

@github-wuzhh
Copy link

Tks

@SergeSysoev
Copy link

I tried to add to test DB xss_clean("< a href="#">a</ a>") (without spaces). It added and destroy some data. Is it bug? Or it's not XSS category?

@MrQuiet
Copy link

MrQuiet commented Sep 29, 2015

thanks for a lot
helpful

@ymakux
Copy link

ymakux commented Oct 26, 2015

A better alternative to this class https://github.com/ymakux/xss
Ready for production. Based on Kses and Drupal 7 filter

@rola2010
Copy link

please I need a case that this filter can not catch i try most of cases but these cases were catches please and need at least one case that bypass this filter

@Barismes
Copy link

rola2010 : test" onmouseover="alert(document.cookie);"

@rola2010
Copy link

Asynth: unfortunately the code catch this,it seems it can't be broken

@mbijon
Copy link
Author

mbijon commented Mar 8, 2016

I still recommend HTML Purifier or kses instead of this gist.

However:

  • UPDATED: This class-based version of xssClean recurses BOTH the encoded entity & tag removal routines. This solves a vulnerability found by 0xmitsurugi & reported privately.
  • UPDATED: The exploit reported by @soaj1664 in this comment is fixed. It could have only been effective at the end of the file or if there were no other ">" characters after their exploit ...the problem with filtering that exploint without ONLY a closing ">" is that users would see removal of the entire input body. Instead of removing the entire message body after that exploit, I've chosen to remove up to the next non-word character (such as a newline or file ending char). This could still remove the rest of the input if there are no non-word characters before the end, but it helps keep some of the message in MOST cases, but still making this filter more secure.

@webhacking
Copy link

👍

@go-english
Copy link

thank you so much

@petrospap
Copy link

Hello, I am new to all xss stuff
how to prevent this

<?php
$_GET['a'] = 'javascript:alert(document.cookie)';
$href = xssClean($_GET['a']);
echo '<a href="'.$href.'">XSS link</a>';
?>

@properties
Copy link

properties commented Aug 2, 2016

@petrospap

Just echo it with htmlspecialchars(). Example:

$_GET['a'] = 'javascript:alert(document.cookie)';
$href = $_GET['a'];
echo '<a href="'.htmlspecialchars($href).'">XSS link</a>';
?>

@rudSarkar
Copy link

pice of code ;)

@nat4tq
Copy link

nat4tq commented Feb 8, 2018

Hi,

I was testing your filter against a set of XSS test inputs. It seems that your filter is still vulnerable to XSS such as with inputs that contain XSS payloads in the comment-type, anchor and image tags. Examples of one of each are:

<!--#exec cmd=""/usr/X11R6/bin/xterm ?display 127.0.0.1:0 &""-->
<a href="jAvAsCrIpT&colon;alert&lpar;1&rpar;">X</a>
/><img/onerror=\x09javascript:alert(1)\x09src=xxx:x />

A full report can be read in our paper, "Assessment of Dynamic Open-source Cross-site Scripting Filters as Security Devices in Web Applications". I kindly suggest that you add these tags to the blacklist to make it more robust against XSS.

Thank you.

@rusyasoft
Copy link

@nat4tq share your paper with us then :)

@xiamuguizhi
Copy link

@nat4tq I tried the code you provided on the local server. This is really a problem. Do you have any solutions?

@voku
Copy link

voku commented Oct 31, 2022

@nat4tq I tried the code you provided on the local server. This is really a problem. Do you have any solutions?

You could try this library : https://github.com/voku/anti-xss

@mbijon
Copy link
Author

mbijon commented Nov 1, 2022

My method is deprecated. I now recommend http://htmlpurifier.org/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment