Return to Snippet

Revision: 55130
at January 27, 2012 13:29 by clopez


Initial Code
tr -sc 'A-Za-z' '\012' < text.txt | sort | uniq -c | sort -nr > output_ngram.txt

Initial URL


Initial Description
When you run this over a text.txt with some text you will get the word distribution on output_ngram.txt as follows:

  30 m
  29 por
  29 aplicaci
  27 modelo
  27 datos
  24 con
  21 este
  21 esta
  20 En
  18 posible
  18 palabras
  18 como
  17 texto
  14 tem
  14 no
  14 documentos
  14 cada
  14 Por
  13 ya
  13 todo
  13 textos
  13 proceso

Initial Title
Get word frequency distribution

Initial Tags
Bash, text

Initial Language
Bash