Python is great for a lot of things. Here's another, web scraping. Web scraping is the act of programmatically grabbing information from webpages. Typically from the HTML returned by a website.
NOTE: Web scraping can be abused and in many cases will get you banned from websites (sorry pastebin!), only use it when you are 100% positive it is allowed
We're going to use urllib2 and Beautiful Soup although you can use your choice of HTTP libraries (requests is another big one) in place of urllib2.
from bs4 import BeautifulSoup import urllib2 req = urllib2.Request("http://www.crummy.com/software/BeautifulSoup/") response = urllib2.urlopen(req) the_page = response.read() soup = BeautifulSoup(the_page) text = soup.findAll("p") print text.text
In this example we are finding all of the
tags (meaning it comes back as an array) and grabbing the text from them with text.text.
If we wanted the complete text of the webpage we would simply change to "soup.findAll("html") which will grab all of the text in between the two html tags.
Subscribe to Adventures In Techland
Get the latest posts delivered right to your inbox