Nov. 06, 2012

Python Mechanize Cheat Sheet

Mechanize

A very useful python module for navigating through web forms is Mechanize. 

In a previous post I wrote about "Browsing in Python with Mechanize". 

Today I found this excellent cheat sheet on scraperwiki that I would like to
share. 

Create a browser object

Create a browser object and give it some optional settings.
import mechanize
br = mechanize.Browser()
br.set_all_readonly(False)    # allow everything to be written to
br.set_handle_robots(False)   # ignore robots
br.set_handle_refresh(False)  # can sometimes hang without this
br.addheaders =   	      	# [('User-agent', 'Firefox')]

Open a webpage

Open a webpage and inspect its contents
response = br.open(url)
print response.read()      # the text of the page
response1 = br.response()  # get the response again
print response1.read()     # can apply lxml.html.fromstring()

Using forms

List the forms that are in the page
for form in br.forms():
    print "Form name:", form.name
    print form
To go on the mechanize browser object must have a form selected
br.select_form("form1")         # works when form has a name
br.form = list(br.forms())[0]  # use when form is unnamed

Using Controls

Iterate through the controls in the form.
for control in br.form.controls:
    print control
    print "type=%s, name=%s value=%s" % (control.type, control.name, br[control.name])
Controls can be found by name
control = br.form.find_control("controlname")
Having a select control tells you what values can be selected
if control.type == "select":  # means it's class ClientForm.SelectControl
    for item in control.items:
    print " name=%s values=%s" % (item.name, str([label.text  for label in item.get_labels()]))
Because 'Select' type controls can have multiple selections, they must be set
with a list, even if it is one element.
print control.value
print control  # selected value is starred
control.value = ["ItemName"]
print control
br[control.name] = ["ItemName"]  # equivalent and more normal
Text controls can be set as a string
if control.type == "text":  # means it's class ClientForm.TextControl
    control.value = "stuff here"
br["controlname"] = "stuff here"  # equivalent
Controls can be set to readonly and disabled.
control.readonly = False
control.disabled = True
OR disable all of them like so
for control in br.form.controls:
   if control.type == "submit":
       control.disabled = True

Submit the form

When your form is complete you can submit
 
response = br.submit()
print response.read()
br.back()   # go back

Finding Links

Following links in mechanize is a hassle because you need the have the link
object.

Sometimes it is easier to get them all and find the link you want from the text.
for link in br.links():
    print link.text, link.url
Follow link and click links is the same as submit and click
request = br.click_link(link)
response = br.follow_link(link)
print response.geturl()
I hope that you got more understanding of the Mechanize module in Python.

Recommended Python Training – Treehouse

Treehouse For Python training, our top recommendation is Treehouse.

Treehouse is an online training service that teaches web design, web development and app development with videos, quizzes and interactive coding exercises.

Treehouse has beginner to advanced Python training that programmers of all levels benefit from.



Read more about:
Disclosure of Material Connection: Some of the links in the post above are “affiliate links.” This means if you click on the link and purchase the item, I will receive an affiliate commission. Regardless, PythonForBeginners.com only recommend products or services that we try personally and believe will add value to our readers.