Word Frequency Count


/ Published in: PHP
Save to your folder(s)

Credit for this code goes to the author, refer the discussion in stackoverflow.


Copy this code and paste it in your HTML
  1. <?php
  2.  
  3. $filename = "largefile.txt";
  4.  
  5. /* get content of $filename in $content */
  6. $content = strtolower(file_get_contents($filename));
  7.  
  8. /* split $content into array of substrings of $content i.e wordwise */
  9. $wordArray = preg_split('/[^a-z]/', $content, -1, PREG_SPLIT_NO_EMPTY);
  10.  
  11. /* "stop words", filter them */
  12. $filteredArray = array_filter($wordArray, function($x){
  13. return !preg_match("/^(.|a|an|and|the|this|at|in|or|of|is|for|to)$/",$x);
  14. });
  15.  
  16. /* get associative array of values from $filteredArray as keys and their frequency count as value */
  17. $wordFrequencyArray = array_count_values($filteredArray);
  18.  
  19. /* Sort array from higher to lower, keeping keys */
  20. arsort($wordFrequencyArray);
  21.  
  22. /* grab Top 10, huh sorted? */
  23. $top10words = array_slice($wordFrequencyArray,0,10);
  24.  
  25. /* display them */
  26. foreach ($top10words as $topWord => $frequency)
  27. echo "$topWord -- $frequency<br/>";
  28.  
  29. ?>

URL: http://stackoverflow.com/questions/3169051/code-golf-word-frequency-chart

Report this snippet


Comments

RSS Icon Subscribe to comments

You need to login to post a comment.