Package tilda.utils

Class HTMLFilter


  • public class HTMLFilter
    extends java.lang.Object
    • Constructor Summary

      Constructors 
      Constructor Description
      HTMLFilter()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static java.lang.String cleanAbsolute​(java.lang.String Str)
      Blindly replaces all '<' and '>' in the passed in Str with '&lt;' and '&gt;' respectively.
      static java.lang.String cleanSmart​(java.lang.String Name, java.lang.String Str)
      Detects and disables several potentially dangerous code snippets in HTML content.
      protected static int detect​(java.lang.String Name, java.util.regex.Pattern P, java.lang.String Str)  
      protected static int findFirst​(java.lang.String Name, java.util.regex.Pattern P, java.lang.String Str)
      returns the start index of the first match
      protected static int findLast​(java.lang.String Name, java.util.regex.Pattern P, java.lang.String Str)
      returns the end index of the last match
      protected static java.lang.String formatReportOutput​(java.lang.String Name, java.lang.String Value)  
      protected static java.lang.String getEndTagRegex​(java.lang.String Tag)  
      static java.util.List<java.lang.String> getFilterReportForThread()  
      protected static java.lang.String getStartTagRegex​(java.lang.String Tag)  
      protected static java.lang.String getTagBlockRegex​(java.lang.String[] Tags)  
      protected static void replace​(java.lang.String Name, java.util.regex.Pattern P, java.lang.StringBuffer Src, java.lang.StringBuffer Dest, java.lang.String ReplaceStr)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • DQUOTED_STR_BASE

        protected static final java.lang.String DQUOTED_STR_BASE
        See Also:
        Constant Field Values
      • _BODYSTART_PATTERN

        protected static final java.util.regex.Pattern _BODYSTART_PATTERN
      • _BODYEND_PATTERN

        protected static final java.util.regex.Pattern _BODYEND_PATTERN
      • _TAGREMOVE_PATTERN

        protected static final java.util.regex.Pattern _TAGREMOVE_PATTERN
      • _JS_PATTERN

        protected static final java.util.regex.Pattern _JS_PATTERN
      • _ONXXX_PATTERN

        protected static final java.util.regex.Pattern _ONXXX_PATTERN
    • Constructor Detail

      • HTMLFilter

        public HTMLFilter()
    • Method Detail

      • cleanAbsolute

        public static java.lang.String cleanAbsolute​(java.lang.String Str)
        Blindly replaces all '<' and '>' in the passed in Str with '&lt;' and '&gt;' respectively. Very fast, but it destroys HTML contents.
        Parameters:
        Str - The string to clean up
        Returns:
        the cleaned up Str
      • getStartTagRegex

        protected static java.lang.String getStartTagRegex​(java.lang.String Tag)
      • getEndTagRegex

        protected static java.lang.String getEndTagRegex​(java.lang.String Tag)
      • getTagBlockRegex

        protected static java.lang.String getTagBlockRegex​(java.lang.String[] Tags)
      • getFilterReportForThread

        public static java.util.List<java.lang.String> getFilterReportForThread()
      • formatReportOutput

        protected static java.lang.String formatReportOutput​(java.lang.String Name,
                                                             java.lang.String Value)
      • replace

        protected static void replace​(java.lang.String Name,
                                      java.util.regex.Pattern P,
                                      java.lang.StringBuffer Src,
                                      java.lang.StringBuffer Dest,
                                      java.lang.String ReplaceStr)
      • findFirst

        protected static int findFirst​(java.lang.String Name,
                                       java.util.regex.Pattern P,
                                       java.lang.String Str)
        returns the start index of the first match
      • findLast

        protected static int findLast​(java.lang.String Name,
                                      java.util.regex.Pattern P,
                                      java.lang.String Str)
        returns the end index of the last match
      • detect

        protected static int detect​(java.lang.String Name,
                                    java.util.regex.Pattern P,
                                    java.lang.String Str)
      • cleanSmart

        public static java.lang.String cleanSmart​(java.lang.String Name,
                                                  java.lang.String Str)
        Detects and disables several potentially dangerous code snippets in HTML content. It's much slower than {@link #FilterCleanAbsolute), but it conserves HTML contents.

        The following patterns are tracked and addressed:
        • <SCRIPT>...</SCRIPT>
        • <FRAME>...</FRAME>
        • <LINK>...</LINK>
        • <STYLE>...</STYLE>
        • ...<BODY> and </BODY>...
        • onXXX event handlers in any HTML tags
        • "javascript:" in src and href tag attributes
        The returned String will likely be larger but should remain HTML compliant, although nonsensical. For example:
        • <A href="javascript:somethingbad();"> --> <A BADhref="">
        • <IMG onHover="somethingbad();"> --> <IMG BADonHover="">
        • <SCRIPT>SomethingBad()</script> --> <BADscript/>

        If found, an attack is logged in the ThreadLocal List<String> which you can get through {@link #getFilterReportForThread}.
        Parameters:
        Name - The Name associated to this string for the logging of offenses.
        Str - The string to clean up
        Returns:
        the cleaned up string, which should be identical if no offense has been found.