跨站脚本攻击(Cross Site Scripting),为了不和层叠样式表(Cascading Style Sheets, CSS)的缩写混淆,故将跨站脚本攻击缩写为XSS。恶意攻击者往Web页面里插入恶意Script代码,当用户浏览该页之时,嵌入其中Web里面的Script代码会被执行,从而达到恶意攻击用户的目的。

使用Jsoup可以有效的过滤不安全的代码。Jsoup使用白名单的机制来预防XSS攻击,比如白名单中规定只允许<span>标签的存在,那么其他标签都会被过滤掉。

常见的XSS攻击

比如页面的某个表单允许用户输入任意内容,当某个调皮的用户输入如下内容:使用Jsoup防御XSS攻击 - 图1

保存后,你会发现页面文字都变成了红色!

使用Jsoup防御XSS攻击 - 图2

或者输入<script>for(var i=0;i<10;i++){alert("fuck you");}</script>,保存后页面将弹窗10次!

引入Jsoup

使用Maven构建一个简单的Spring Boot项目,在pom中引入:

  1. <dependency>
  2. <groupId>org.jsoup</groupId>
  3. <artifactId>jsoup</artifactId>
  4. <version>1.9.2</version>
  5. </dependency>

JsoupUtil

创建一个JsoupUtil工具类:

  1. import java.io.FileNotFoundException;
  2. import java.io.IOException;
  3. import org.jsoup.Jsoup;
  4. import org.jsoup.nodes.Document;
  5. import org.jsoup.safety.Whitelist;
  6. /**
  7. * Xss过滤工具
  8. *
  9. */
  10. public class JsoupUtil {
  11. private static final Whitelist whitelist = Whitelist.basicWithImages();
  12. /*
  13. * 配置过滤化参数,不对代码进行格式化
  14. */
  15. private static final Document.OutputSettings outputSettings = new Document.OutputSettings().prettyPrint(false);
  16. static {
  17. /*
  18. * 富文本编辑时一些样式是使用style来进行实现的 比如红色字体 style="color:red;" 所以需要给所有标签添加style属性
  19. */
  20. whitelist.addAttributes(":all", "style");
  21. }
  22. public static String clean(String content) {
  23. return Jsoup.clean(content, "", whitelist, outputSettings);
  24. }
  25. }

这里采用的白名单为basicWithImages,Jsoup内置了几种常见的白名单供我们选择,如下表所示:

白名单对象 标签 说明
none 只保留标签内文本内容
simpleText b,em,i,strong,u 简单的文本标签
basic a,b,blockquote,br,cite,code,dd, dl,dt,em,i,li,ol,p,pre,q,small,span, strike,strong,sub,sup,u,ul 基本使用的标签
basicWithImages basic 的基础上添加了 img 标签 及 img 标签的 src,align,alt,height,width,title 属性 基本使用的加上 img 标签
relaxed a,b,blockquote,br,caption,cite, code,col,colgroup,dd,div,dl,dt, em,h1,h2,h3,h4,h5,h6,i,img,li, ol,p,pre,q,small,span,strike,strong, sub,sup,table,tbody,td,tfoot,th,thead,tr,u,ul 在 basicWithImages 的基础上又增加了一部分部分标签

XssHttpServletRequestWrapper

创建一个XssHttpServletRequestWrapper,同过重写getParameter()getParameterValues()getHeader()方法来过滤HTTP请求中参数包含的恶意字符:

  1. import javax.servlet.http.HttpServletRequest;
  2. import javax.servlet.http.HttpServletRequestWrapper;
  3. import org.apache.commons.lang.StringUtils;
  4. import cc.mrbird.common.util.JsoupUtil;
  5. /**
  6. * Jsoup过滤http请求,防止Xss攻击
  7. *
  8. */
  9. public class XssHttpServletRequestWrapper extends HttpServletRequestWrapper {
  10. HttpServletRequest orgRequest = null;
  11. private boolean isIncludeRichText = false;
  12. public XssHttpServletRequestWrapper(HttpServletRequest request, boolean isIncludeRichText) {
  13. super(request);
  14. orgRequest = request;
  15. this.isIncludeRichText = isIncludeRichText;
  16. }
  17. /**
  18. * 覆盖getParameter方法,将参数名和参数值都做xss过滤如果需要获得原始的值,则通过super.getParameterValues(name)来获取
  19. * getParameterNames,getParameterValues和getParameterMap也可能需要覆盖
  20. */
  21. @Override
  22. public String getParameter(String name) {
  23. if (("content".equals(name) || name.endsWith("WithHtml")) && !isIncludeRichText) {
  24. return super.getParameter(name);
  25. }
  26. name = JsoupUtil.clean(name);
  27. String value = super.getParameter(name);
  28. if (StringUtils.isNotBlank(value)) {
  29. value = JsoupUtil.clean(value);
  30. }
  31. return value;
  32. }
  33. @Override
  34. public String[] getParameterValues(String name) {
  35. String[] arr = super.getParameterValues(name);
  36. if (arr != null) {
  37. for (int i = 0; i < arr.length; i++) {
  38. arr[i] = JsoupUtil.clean(arr[i]);
  39. }
  40. }
  41. return arr;
  42. }
  43. /**
  44. * 覆盖getHeader方法,将参数名和参数值都做xss过滤如果需要获得原始的值,则通过super.getHeaders(name)来获取
  45. * getHeaderNames 也可能需要覆盖
  46. */
  47. @Override
  48. public String getHeader(String name) {
  49. name = JsoupUtil.clean(name);
  50. String value = super.getHeader(name);
  51. if (StringUtils.isNotBlank(value)) {
  52. value = JsoupUtil.clean(value);
  53. }
  54. return value;
  55. }
  56. /**
  57. * 获取原始的request
  58. */
  59. public HttpServletRequest getOrgRequest() {
  60. return orgRequest;
  61. }
  62. /**
  63. * 获取原始的request的静态方法
  64. */
  65. public static HttpServletRequest getOrgRequest(HttpServletRequest req) {
  66. if (req instanceof XssHttpServletRequestWrapper) {
  67. return ((XssHttpServletRequestWrapper) req).getOrgRequest();
  68. }
  69. return req;
  70. }
  71. }

XssFilter

创建XssFilter,通过使用上面定义的XssHttpServletRequestWrapper类中的getParameter()等方法来保证参数得到了过滤:

  1. import java.io.IOException;
  2. import java.util.ArrayList;
  3. import java.util.List;
  4. import java.util.regex.Matcher;
  5. import java.util.regex.Pattern;
  6. import javax.servlet.Filter;
  7. import javax.servlet.FilterChain;
  8. import javax.servlet.FilterConfig;
  9. import javax.servlet.ServletException;
  10. import javax.servlet.ServletRequest;
  11. import javax.servlet.ServletResponse;
  12. import javax.servlet.http.HttpServletRequest;
  13. import javax.servlet.http.HttpServletResponse;
  14. import org.apache.commons.lang.BooleanUtils;
  15. import org.apache.commons.lang.StringUtils;
  16. import org.slf4j.Logger;
  17. import org.slf4j.LoggerFactory;
  18. /**
  19. * Xss攻击拦截器
  20. *
  21. */
  22. public class XssFilter implements Filter {
  23. private static Logger logger = LoggerFactory.getLogger(XssFilter.class);
  24. // 是否过滤富文本内容
  25. private static boolean IS_INCLUDE_RICH_TEXT = false;
  26. public List<String> excludes = new ArrayList<String>();
  27. @Override
  28. public void init(FilterConfig filterConfig) throws ServletException {
  29. logger.info("------------ xss filter init ------------");
  30. String isIncludeRichText = filterConfig.getInitParameter("isIncludeRichText");
  31. if (StringUtils.isNotBlank(isIncludeRichText)) {
  32. IS_INCLUDE_RICH_TEXT = BooleanUtils.toBoolean(isIncludeRichText);
  33. }
  34. String temp = filterConfig.getInitParameter("excludes");
  35. if (temp != null) {
  36. String[] url = temp.split(",");
  37. for (int i = 0; url != null && i < url.length; i++) {
  38. excludes.add(url[i]);
  39. }
  40. }
  41. }
  42. @Override
  43. public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
  44. throws IOException, ServletException {
  45. HttpServletRequest req = (HttpServletRequest) request;
  46. HttpServletResponse resp = (HttpServletResponse) response;
  47. if (handleExcludeURL(req, resp)) {
  48. chain.doFilter(request, response);
  49. return;
  50. }
  51. XssHttpServletRequestWrapper xssRequest = new XssHttpServletRequestWrapper((HttpServletRequest) request,
  52. IS_INCLUDE_RICH_TEXT);
  53. chain.doFilter(xssRequest, response);
  54. }
  55. @Override
  56. public void destroy() {
  57. }
  58. private boolean handleExcludeURL(HttpServletRequest request, HttpServletResponse response) {
  59. if (excludes == null || excludes.isEmpty()) {
  60. return false;
  61. }
  62. String url = request.getServletPath();
  63. for (String pattern : excludes) {
  64. Pattern p = Pattern.compile("^" + pattern);
  65. Matcher m = p.matcher(url);
  66. if (m.find())
  67. return true;
  68. }
  69. return false;
  70. }
  71. }

Spring Boot中配置XssFilter

使用JavaConfig的形式配置:

  1. @Bean
  2. public FilterRegistrationBean xssFilterRegistrationBean() {
  3. FilterRegistrationBean filterRegistrationBean = new FilterRegistrationBean();
  4. filterRegistrationBean.setFilter(new XssFilter());
  5. filterRegistrationBean.setOrder(1);
  6. filterRegistrationBean.setEnabled(true);
  7. filterRegistrationBean.addUrlPatterns("/*");
  8. Map<String, String> initParameters = new HashMap<String, String>();
  9. initParameters.put("excludes", "/favicon.ico,/img/*,/js/*,/css/*");
  10. initParameters.put("isIncludeRichText", "true");
  11. filterRegistrationBean.setInitParameters(initParameters);
  12. return filterRegistrationBean;
  13. }

参考文章:

  1. https://blog.csdn.net/u014411966/article/details/78164752
  2. https://www.jianshu.com/p/32abc12a175a?nomobile=yes