How to strip HTML tags, scripts, and styles from a web page


/ Published in: PHP
Save to your folder(s)



Copy this code and paste it in your HTML
  1. /**
  2.  * Remove HTML tags, including invisible text such as style and
  3.  * script code, and embedded objects. Add line breaks around
  4.  * block-level tags to prevent word joining after tag removal.
  5.  */
  6. function strip_html_tags( $text )
  7. {
  8. $text = preg_replace(
  9. // Remove invisible content
  10. '@<head[^>]*?>.*?</head>@siu',
  11. '@<style[^>]*?>.*?</style>@siu',
  12. '@<script[^>]*?.*?</script>@siu',
  13. '@<object[^>]*?.*?</object>@siu',
  14. '@<embed[^>]*?.*?</embed>@siu',
  15. '@<applet[^>]*?.*?</applet>@siu',
  16. '@<noframes[^>]*?.*?</noframes>@siu',
  17. '@<noscript[^>]*?.*?</noscript>@siu',
  18. '@<noembed[^>]*?.*?</noembed>@siu',
  19. // Add line breaks before and after blocks
  20. '@</?((address)|(blockquote)|(center)|(del))@iu',
  21. '@</?((div)|(h[1-9])|(ins)|(isindex)|(p)|(pre))@iu',
  22. '@</?((dir)|(dl)|(dt)|(dd)|(li)|(menu)|(ol)|(ul))@iu',
  23. '@</?((table)|(th)|(td)|(caption))@iu',
  24. '@</?((form)|(button)|(fieldset)|(legend)|(input))@iu',
  25. '@</?((label)|(select)|(optgroup)|(option)|(textarea))@iu',
  26. '@</?((frameset)|(frame)|(iframe))@iu',
  27. ),
  28. ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',
  29. "\n\$0", "\n\$0", "\n\$0", "\n\$0", "\n\$0", "\n\$0",
  30. "\n\$0", "\n\$0",
  31. ),
  32. $text );
  33. return strip_tags( $text );
  34. }

URL: http://nadeausoftware.com/articles/2007/09/php_tip_how_strip_html_tags_web_page

Report this snippet


Comments

RSS Icon Subscribe to comments

You need to login to post a comment.