Clean Word HTML using Regular Expressions


/ Published in: PHP
Save to your folder(s)

The PHP Code appears in Post Comments


Copy this code and paste it in your HTML
  1. function cleanHTML($html) {
  2. /// <summary>
  3. /// Removes all FONT and SPAN tags, and all Class and Style attributes.
  4. /// Designed to get rid of non-standard Microsoft Word HTML tags.
  5. /// </summary>
  6. // start by completely removing all unwanted tags
  7.  
  8. $html = ereg_replace("<(/)?(font|span|del|ins)[^>]*>","",$html);
  9.  
  10. // then run another pass over the html (twice), removing unwanted attributes
  11.  
  12. $html = ereg_replace("<([^>]*)(class|lang|style|size|face)=(\"[^\"]*\"|'[^']*'|[^>]+)([^>]*)>","<\\1>",$html);
  13. $html = ereg_replace("<([^>]*)(class|lang|style|size|face)=(\"[^\"]*\"|'[^']*'|[^>]+)([^>]*)>","<\\1>",$html);
  14.  
  15. return $html
  16. }

URL: http://tim.mackey.ie/CommentView,guid,2ece42de-a334-4fd0-8f94-53c6602d5718.aspx

Report this snippet


Comments

RSS Icon Subscribe to comments

You need to login to post a comment.