Quantcast

Using the Gutenberg Project’s free text of Don Quijote + Unix for Poets, here are the most used (non-short) words in Miguel de Cervantes’ famous work:

  • 2167 Quijote
  • 2145 Sancho
  • 1331 porque
  • 1053 respondió
  • 1027 había
  •  900 merced
  •  813 vuestra
  •  796 todos
  •  711 cuando
  •  625 donde
  •  614 quien
  •  577 decir
  •  573 caballero
  •  535 hacer
  •  525 aunque
  •  482 aquel
  •  464 aquí
  •  462 estaba
  •  450 sobre
  •  431 está
  •  416 tanto
  •  414 verdad
  •  409 allí
  •  398 tengo
  •  393 mundo
  •  385 tiene
  •  383 alguna
  •  377 hasta
  •  371 dicho
  •  363 parte
  •  361 entre
  •  359 todas
  •  358 buena
  •  353 luego
  •  346 cosas
  •  344 menos
  •  344 lugar
  •  342 tenía
  •  328 manera
  •  328 aquella
  •  327 tiempo
  •  325 Panza
  •  310 ahora
  •  304 puesto
  •  292 caballeros
  •  289 ellos
  •  287 mucho
  •  285 fuera
  •  283 puede
  •  282 antes
  •  281 mejor
  •  281 algún
  •  280 visto
  •  279 Dulcinea
  •  272 tierra
  •  269 otras
  •  258 padre
  •  258 otros
  •  258 hombre
  •  257 hecho
  •  254 haber
  •  253 quiero
  •  252 cielo
  •  250 habían
  •  248 amigo
  •  247 saber
  •  246 historia
  •  245 camino
  •  242 tener
  •  240 escudero
  •  239 parece
  •  239 manos
  •  238 días
  •  234 muchas
  •  231 estas
  •  222 mujer
  •  222 desta
  •  221 será
  •  219 mesmo
  •  219 cuanto
  •  219 cómo
  •  215 quién
  •  214 cabeza
  •  211 punto
  •  211 noche
  •  207 veces
  •  207 replicó
  •  205 cuenta
  •  203 Rocinante
  •  202 parecer
  •  200 razones
  •  199 también
  •  198 fuese
  •  198 duque
  •  198 diciendo
  •  197 andante
  •  196 muchos
  •  196 estos
  •  196 caballo
  •  195 vuesa
  •  195 nuestro
  •  193 podía

CODE: tr -sc '[A-Z][a-z][áéíóú]' '[\012*]' < quijote.textfile | perl -e 'while (<>) { print if length($_)>5; }' | sort | uniq -c | sort -rn > quijote.hist

Here’s the power law distribution of non-short words in Don Quijote:

CODE:  tr -sc '[A-Z][a-z][áéíóú]' '[\012*]' < quijote | perl -e 'while (<>) { print if length($_)>5; }' | sort | uniq -c | sort -rn | perl -e 'while (<>) { print $1 if $_ =~ /(\d+)/; print "\n"; } ' | uniq -c > quijote.countofcounts.powerlaw.hist

> par(bg="#fafaff", col="#111177")
> plot(quijote.countofcounts.powerlaw, log="y", type="s", lwd=4, xlab="Number of times a word appears in the text", ylab="Number of words with this frequency", main="Word Frequency in Don Quijote de la Mancha", col="#111177")

And including short words retains the power law distribution.

CODE: tr -sc '[A-Z][a-z][áéíóú]' '[\012*]' < quijote | uniq -c | sort -rn | perl -e ‘while (<>) { print $1 if $_ =~ /(\d+)/; print “\n”; } ’ | uniq -c > quijote.countofcounts.powerlaw.hist.shortwordstambien

(Source: gutenberg.org)

72 notes

  1. caoseprogresso reblogged this from proofmathisbeautiful
  2. babybewilderbeast reblogged this from proofmathisbeautiful
  3. chaddyr23 reblogged this from proofmathisbeautiful and added:
    To the two people I’m following who are currently in Espana!
  4. fyeahfluffy reblogged this from proofmathisbeautiful
  5. cuervodf reblogged this from proofmathisbeautiful
  6. fuckyeahterminals reblogged this from proofmathisbeautiful
  7. coolstuffbs reblogged this from proofmathisbeautiful
  8. chyu reblogged this from proofmathisbeautiful
  9. bparramosqueda reblogged this from proofmathisbeautiful
  10. gyudon reblogged this from proofmathisbeautiful and added:
    Let this post serve as a self-reminder to read this eventually x_x (In English)
  11. minutiarum reblogged this from proofmathisbeautiful
  12. vesania-hanako reblogged this from proofmathisbeautiful
  13. franciscello reblogged this from proofmathisbeautiful
  14. nowayinelle reblogged this from bonbon56
  15. bonbon56 reblogged this from proofmathisbeautiful
  16. audente reblogged this from proofmathisbeautiful
  17. rachelderby reblogged this from proofmathisbeautiful
  18. ned-starks-bastards reblogged this from proofmathisbeautiful and added:
    Don Quixote Word Statistics
  19. foreversyncing reblogged this from proofmathisbeautiful
  20. eirizu reblogged this from proofmathisbeautiful
  21. redsesame reblogged this from proofmathisbeautiful and added:
    I can’t think of how I could possibly use Unix for Poets, but I WANT TO.
  22. coolstorybroah reblogged this from proofmathisbeautiful
  23. proofmathisbeautiful reblogged this from isomorphismes and added:
    I love this!! And Don Quijote is one of my favorites!!