Navigating the tree:

Going down:

navigating using tag names

h_b_div_paragraphs = soup.html.body.div.p

Will get the <p> elements inside a div inside the body inside the html element.

.contents and .children

div_children = soup.div.children
div_contents = soup.div.contents

This will get the direct child element(s) of the element being looked at



You can access a tag's attributes by treating the tag like a dictionary and you can access that dictionary directly as .attrs


div_descendants = soup.div.descendants

This will get all the child elements of the element being looked at


div_link_text = soup.div.a.string

If a tag has only one child, and that child is a NavigableString, the child is made available as .string, will return 'None' if there is no string found

.strings and .stripped_strings

div_text = soup.div.strings

If there's more than one thing inside a tag, you can still look at just the strings. Use the .stringsgenerator

Going up: Top 


title = soup.title.string.parent

You can access an element's parent with the .parentattribute. The string in the title tag has a parent, the titel tag


link = soup.a
for parent in link.parents:
    if parent is None:
      print parent

You can iterate over all of an element's parents with .parents. This example uses .parents to travel from an <a> tag buried deep within the document, to the very top of the document:

Going sideways Top 


The .(next/previous)_(sibling(s)/element(s)) can be used to navigate between page elements, getting either a single element or a list of elements. If there are no more, then these will return 'None'

Searching the tree Top 

.find()/.find_all()/.find_...() »
(..all_(next/previous)(), )

Returns either the first result or a list of the results

The limit argument

soup.find_all("a", limit=2)

The recursive argument.

soup.find_all("a", recursive=False)\

Limits the number of returned results either by a number (limit), or to only the direct children (recursive)

Modifying the tree Top 

Changing tag names and attributes

                        = "blockquote"
tag['class'] = 'verybold'

Change a tags name or attributes (attributes like they are key-value pairs)

Modifying tag.string

tag = soup.a
tag.string = "New link text."

Replaces the tag's contents with the string you give


It works just like calling .append() on a Python list


You can .append() a new string or new tag to the document


Tag will be inserted at whatever numeric position you say.

.insert_before() and .insert_after()

The .insert_before()/.insert_after() methods insert a tag or string immediately before or after the target element


Removes the contents of a tag


Removes a tag or string from the tree. It returns the tag or string that was extracted


Removes a tag from the tree, then completely destroys it


Removes a tag or string from the tree, and replaces it with the tag or string of your choice


Wraps an element in the tag you specify and returns the new wrapper

Filters: Top 

def has_class_but_no_id(tag):
    return tag.has_attr('class') and not tag.has_attr('id')


The filters used inside the methods can have various formes, a sring, a regex (re.compile("regex")), a list, True; which will mach everything it can, or a function which should return True if the right tag was found and False if not.
Here's a function that returns True if a tag defines the class attribute but doesn't define the id attribute:

def surrounded_by_strings(tag):

return (isinstance(tag.next_element, NavigableString) and \
    isinstance(tag.previous_element, NavigableString))

for tag in soup.find_all(surrounded_by_strings):

Here's a function that returns True if a tag is surrounded by string objects:

soup.find('p', {'style': 'display:inline'})

The filters can become quite specific, here we get a p element that has a style attribute set to 'display;inline':


Or if an attribute has a certain string inside (using regex):


def has_six_characters(css_class):
    return css_class is not None and len(css_class) == 6


As with any keyword argument, you can pass class_ a string, a regular expression (re.compile(regex)), a function, or True