/ Published in: PHP
I’ve found this nice small bot on the www.php.net site, thanks to the author of the script on the preg_replace page.
This bot returns the text content of a url and it could be used to take text from a site and find relevant words to search.
This bot returns the text content of a url and it could be used to take text from a site and find relevant words to search.
Expand |
Embed | Plain Text
Copy this code and paste it in your HTML
function webpage2txt($url) { $user_agent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"; '@<style[^>]*?>.*?</style>@siU', // Strip style tags properly '@<[\/\!]*?[^<>]*?>@si', // Strip out HTML tags '@<![\s\S]*?�[ \t\n\r]*>@', // Strip multi-line comments including CDATA '/\s{2,}/', ); $pat[0] = "/^\s+/"; $pat[2] = "/\s+\$/"; $rep[0] = ""; $rep[2] = " "; return $text; } echo webpage2txt("http://www.repubblica.it");
URL: http://www.barattalo.it/2010/01/16/php-web-page-to-text-function/