Return to Snippet

Revision: 30418
at August 13, 2010 11:07 by math89

Initial Code
function html2txt($document){
     $search = array('@<script[^>]*?>.*?</script>@si', // Strip out javascript
     '@<style[^>]*?>.*?</style>@siU', // Strip style tags properly
     '@<[?]php[^>].*?[?]>@si', //scripts php
     '@<[?][^>].*?[?]>@si', //scripts php
     '@<[\/\!]*?[^<>]*?>@si', // Strip out HTML tags
     '@<![\s\S]*?--[ \t\n\r]*>@' // Strip multi-line comments including CDATA
     );$text = preg_replace($search, '', $document);
     return $text;

// Usage

$html_source = file_get_contents('');
$txt = html2txt($html_source);

Initial URL

Initial Description
Turn a html source into a full text document by removing all html tags and other unneeded code.

Initial Title
Convert html source to full text

Initial Tags
html, text

Initial Language