Python: Most Used Words

I recently saw a program that gathered the 10 most common words on a webpage and displayed them in a window along with their word count. I decided to build my own using Python and some code I had written before to scrape data from webpages.

# Gives a list of the most common words
# Hunter Thornsberry - hunter@hunterthornsberry.com
from BeautifulSoup import BeautifulSoup  
import urllib2  
import random  
import time

#limit on the number of top words we want to know the count of
limit = 10

#random integer to select user agent
randomint = random.randint(0,7)

#random interger to select sleep time
randomtime = random.randint(1, 30)

#urls to be scraped
urls = ["http://raw.adventuresintechland.com/freedom.html"]

#user agents
user_agents = [  
    'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11',
    'Opera/9.25 (Windows NT 5.1; U; en)',
    'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)',
    'Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Kubuntu)',
    'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.142 Safari/535.19',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:11.0) Gecko/20100101 Firefox/11.0',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0.1) Gecko/20100101 Firefox/8.0.1',
    'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.151 Safari/535.19'
]

words = []

index = 0  
while len(urls) > index:  
    opener = urllib2.build_opener()
    opener.addheaders = [('User-agent', user_agents[randomint])]
    response = opener.open(urls[index])
    the_page = response.read()
    soup = BeautifulSoup(the_page)

    #Search criteria (is an html tag). Example <p>, <body>, <h1>, etc.
    text = soup.findAll("body")

    #Runs until it has an index out of range error and breaks, this will return every response
    while True:
        try:
            i = 0
            while True:
                #print text[i].text
                words.append(text[i].text)
                i = i + 1
        except IndexError:
            print "--End--"
            break
    index = index + 1

words = words[0].split(" ")  
words = [element.lower() for element in words]  
sort = []  
for word in set(words):  
    sort.append(str(words.count(word)) + " " + word)

x = 0  
for item in sorted(sort, reverse=True):  
    print item
    if x == limit:
        break
    x = x + 1

This code basically comes in two parts, the first part gets the data from the webpage. I've got a whole blog post dedicated just to that.

This is the second part of the code:

words = words[0].split(" ")  
words = [element.lower() for element in words]  
sort = []  
for word in set(words):  
    sort.append(str(words.count(word)) + " " + word)

x = 0  
for item in sorted(sort, reverse=True):  
    print item
    if x == limit:
        break
    x = x + 1

Here I am using .split(" ") to find all of the words. Then I am making every word lower case (as to get a true count of the words, since technically "The" and "the" are two different words). Next the first for loop uses set(words) to get the unique words and appends a string representation of the number of times that word appears in the words list and the word itself.

The second for loop sorts the list and prints the results. Notice sorted() is not a defined function, it is actually built into Python, and we are also passing "reverse=True" so the word with the highest count returns first.

Output

--End--
9 programmers  
9 other  
9 one  
9 new  
9 few  
9 code  
8 when  
8 says  
8 print  
8 first  
8 didn't  
»

Every topic I've tweeted about this year

I decided to go through and list the topics I tweet about so far this year (as of 02/12/2016). I plan on creating a twitter start chart like the one done here: https://www.hella.cheap/twitter-star-chart.html

I've redacted some of my friends names.

Kanye West,25  
Technology,16  
Society,9  
College,8  
Canada,6  
Fashion,5  
Norm Kelly,5  
Video Games,3  
Economics,2  
Networking,2  
Wu-Tang,2  
Beach House,2  
Pokemon,1  
Science,1  
Hometown,1  
Sports,1  
Breaking Bad,1  
Dune,1  
Parents,1  
Politics,1  
Playstation,1  
GonzoHacker,1  
Wu-Tang-Financial,1  
Daft Punk,1  
Rick and Morty,1  
Friend1,1  
Friend2,1  
Travis Scott,1  
Grimes,1  
Florence + The Machine,1  
Drumming,1  
Friend3,1  
New Years,1  
»

Custom New Tab (Firefox, Chrome, Safari, IE)

If you browse the internet everyday like I do there are a select group of websites you visit almost everyday, so why not make your new tab page have links to these websites to help you save time? Even if you choose not to use a custom built new tab you can use any website as your new tab.

Why

The easiest way to show you why you should use a custom new tab page is to show you what my homepage and new tab looks like.
My custom start page This is what I see any time I open a new tab or a new window in my browser. As you can see there are three boxes broken down by category, each with their own title and items. Each one of these items is a link to a webpage I commonly use in that category. Above those boxes I put the Arch Linux logo (as Arch Linux is my default OS).

This is just a sample of what you can do. This works by loading a webpage (either local or remote) upon the opening of a new tab or window, meaning you can wield the full force of any HTML, JavaScript, PHP, etc you can normally use in a webpage because it is a webpage. Like I said above, this also means you can use ANY webpage such as Google, or a random Wikipedia page (https://en.wikipedia.org/wiki/Special:Random).

How

  1. Firefox
  2. Chrome
  3. Safari
  4. Internet Explorer

Firefox

1. Download and install the Custom New Tab extension
2. Select the "hamburger" menu in the top right > Add-ons > Extensions
3. Select the "Preferences" option on the Custom New Tab extension and enter your custom new tab URL and choose any other option you want
Custom New Tab extension Preferences After all your preferences are set your custom new tab page should work!
Notice: there is a slight delay if you choose the "Place focus in URL bar" and "Make URL bar empty" options, meaning some of your text will be erased if you start typing before the URL bar is cleared.


Chrome

1. Download and install New Tab Redirect
2. Select the "hamburger" menu in the top right and select "Settings", then select "Extensions" along the left
3. Under "New Tab Redirect" select "Options" and enter your custom URL
New Tab Redirect Options After all your preferences are set your custom new tab page should work!


Safari

In Safari you can simply set your homepage as the new tab page, this is done by going to Safari > Preferences > General, and setting "New tabs open with:" to "Homepage". Safari Preferences After all your preferences are set your custom new tab page should work!


Internet Explorer

Coming 12/01/15
»