HtmlUnit – Getting Started with HtmlUnit

Content

Content
Introduction
Submitting a form
Finding a specific element

Introduction

The dependencies page lists all the jars that you will need to have in your classpath.

The class org.htmlunit.WebClient is the main starting point. This simulates a web browser and will be used to execute all of the tests. (see WebClient - the browser)

Android
Using HtmlUnit on Android has some challenges because of the subtle technical distinction of java on android. Because of this, we offer a customized distribution to work around these problem. Please check out htmlunit-android on github.

Most unit testing will be done within a framework like JUnit so all the examples here will assume that we are using that.

In the first sample, we create the web client and have it load the homepage from the HtmlUnit website. We then verify that this page has the correct title. Note that getPage() can return different types of pages based on the content type of the returned data. In this case, we are expecting a content type of text/html, so we cast the result to an org.htmlunit.html.HtmlPage.

@Test
public void homePage() throws Exception {
    try (final WebClient webClient = new WebClient()) {
        final HtmlPage page = webClient.getPage("https://www.htmlunit.org/");
        Assert.assertEquals("HtmlUnit – Welcome to HtmlUnit", page.getTitleText());

        final String pageAsXml = page.asXml();
        Assert.assertTrue(pageAsXml.contains("<body class=\"topBarDisabled\">"));

        final String pageAsText = page.asNormalizedText();
        Assert.assertTrue(pageAsText.contains("Support for the HTTP and HTTPS protocols"));
    }
}

Submitting a form

Frequently we want to change values in a form and submit the form back to the server. The following example shows how you might do this.

@Test
public void submittingForm() throws Exception {
    try (final WebClient webClient = new WebClient()) {

        // Get the first page
        final HtmlPage page1 = webClient.getPage("http://some_url");

        // Get the form that we are dealing with and within that form, 
        // find the submit button and the field that we want to change.
        final HtmlForm form = page1.getFormByName("myform");

        final HtmlSubmitInput button = form.getInputByName("submitbutton");
        final HtmlTextInput textField = form.getInputByName("userid");

        // Change the value of the text field
        textField.type("root");

        // Now submit the form by clicking the button and get back the second page.
        final HtmlPage page2 = button.click();
    }
}

Finding form elements

For filling out a form, you first have to find the form elements you like to interact with.

final HtmlTextInput textField = form.getInputByName("userid");

In addition to all the general ways of finding dom elements (see below) the HtmlForm object offers some convenient methods to find form elements:

HtmlForm.getButtonByName(String)
HtmlForm.getButtonsByName(String)
HtmlForm.getCheckedRadioButton(String)
HtmlForm.getInputByName(String)
HtmlForm.getInputByValue(String)
HtmlForm.getInputsByName(String)
HtmlForm.getInputsByValue(String)
HtmlForm.getRadioButtonsByName(String)
HtmlForm.getSelectByName(String)
HtmlForm.getSelectsByName(String)
HtmlForm.getTextAreaByName(String)
HtmlForm.getTextAreasByName(String)
HtmlForm.getElements()

All these methods are working based on a list of all dom elements associated with this form - this list includes all descendants of the form element AND all other elements associated to this form using the 'form' attribute. In general the method HtmlForm.getElements() builds this list and all other methods using this list as base for more filtering.

Text input <input type='test'>

These form elements represented as instances of class HtmlTextInput.

final HtmlTextInput textField = form.getInputByName("userid");

To replace the value with some new text you should use the method HtmlElement#type(String). This call takes care of setting the focus (if required; including triggering all the focus related events) and then simulating the typing of the provided string (char by char, including the keyboard events).

textField.type("RBRi");

If all the events not really needed, you can also use the method HtmlSelectableTextInput#setValue(String).

Text area <textarea>

These form elements represented as instances of class HtmlTextArea.

final HtmlTextArea textArea = form.getInputByName("comment");

The usage of HtmlTextArea is similar to HtmlTextInput (because both derived from HtmlSelectableTextInput). This means you can also use type(String) or even setValue(String) for updating these elements.

textArea.type("HtmlUnit is a great library...");

Radio buttons <input type='radio'> and Checkboxes <input type='checkbox'>

These form elements represented as instances of class HtmlRadioButtonInput/HtmlCheckBoxInput.

final HtmlRadioButtonInput countryGermany = form.getInputByName("radio_country_germany");
final HtmlCheckBoxInput programmingLanguage = form.getInputByName("check_language_java");

Usually your form contains many of these elements organized in groups. To check a radio button of a checkbox you have to use HtmlRadioButtonInput#setChecked(boolean) or HtmlCheckBoxInput#setChecked(boolean).

countryGermany.setChecked(true);
programmingLanguage.setChecked(true);

Checking a single radio button will automatically uncheck all other radio buttons in the same group.

Select <select>

These form elements represented as instances of class HtmlSelect. The individual options are represented by instances of class HtmlOption.

final HtmlSelect currency = form.getInputByName("currency");
                <p>
                    The simplest way to select one of the options is the method
                    <a href='apidocs/org/htmlunit/html/HtmlSelect.html#setSelectedIndex-int-'>HtmlSelect.html#setSelectedIndex(int)</a>.
                </p>
                <source><![CDATA[
currency.setSelectedIndex(true);

To make your code more readable and robust, you have to search for the HtmlOption to select and then use HtmlSelect.html#setSelectedAttribute(HtmlOption, boolean).

HtmlOption euro = currency.getOptionByValue("Euro");
currency.setSelectedAttribute(euro, true);

For single selection select elements, this call also deselects all other options.

Finding a specific element

Once you have a reference to an HtmlPage, you can search for a specific HtmlElement by one of 'get' methods, or by using XPath or CSS selectors.

Traversing the DOM tree

Below is an example of finding a 'div' by an ID, and getting an anchor by name:

@Test
public void getElements() throws Exception {
    try (final WebClient webClient = new WebClient()) {
        final HtmlPage page = webClient.getPage("http://some_url");
        final HtmlDivision div = page.getHtmlElementById("some_div_id");
        final HtmlAnchor anchor = page.getAnchorByName("anchor_name");
    }
}

A simple way for finding elements might be to find all elements of a specific type.

 @Test
 public void getElements() throws Exception {
     try (final WebClient webClient = new WebClient()) {
         final HtmlPage page = webClient.getPage("http://some_url");
         NodeList inputs = page.getElementsByTagName("input");
         final Iterator<E> nodesIterator = nodes.iterator();
         // now iterate
     }
 }

There is rich set of methods usable to locate page elements e.g.

HtmlPage.getAnchors(); HtmlPage.getAnchorByHref(String); HtmlPage.getAnchorByName(String); HtmlPage.getAnchorByText(String)
HtmlPage.getElementById(String); HtmlPage.getElementsById(String); HtmlPage.getElementsByIdAndOrName(String);
HtmlPage.getElementByName(String); HtmlPage.getElementsByName(String)
HtmlPage.getFormByName(String); HtmlPage.getForms()
HtmlPage.getFrameByName(String); HtmlPage.getFrames()

You can also start searching from the document element (HtmlPage.getDocumentElement()) and then traverse the dom tree

HtmlElement.getElementsByAttribute(String, String, String)
DomElement.getElementsByTagName(String); DomElement.getElementsByTagNameNS(String, String)
DomElement.getChildElements(); DomElement.getChildElementCount()
DomElement.getFirstElementChild(); DomElement.getLastElementChild()
HtmlElement.getEnclosingElement(String); HtmlElement.getEnclosingForm()
DomNode.getChildNodes(); DomNode.getChildren(); DomNode.getDescendants(); DomNode.getDomElementDescendants(); DomNode.getFirstChild(); DomNode.getHtmlElementDescendants() DomNode.getLastChild(); DomNode.getNextElementSibling(); DomNode.getNextSibling(); DomNode.getPreviousElementSibling(); getPreviousSibling()

XPath queries

XPath is the suggested way for more complex searches, a brief tutorial can be found in W3Schools

@Test
public void xpath() throws Exception {
    try (final WebClient webClient = new WebClient()) {
        final HtmlPage page = webClient.getPage("https://htmlunit.sourceforge.io/");

        //get list of all divs
        final List<?> divs = page.getByXPath("//div");

        //get div which has a 'id' attribute of 'banner'
        final HtmlDivision div = (HtmlDivision) page.getByXPath("//div[@id='banner']").get(0);
    }
}

CSS Selectors

You can also use CSS selectors

@Test
public void cssSelector() throws Exception {
    try (final WebClient webClient = new WebClient()) {
        final HtmlPage page = webClient.getPage("https://www.htmlunit.org/");

        //get list of all divs
        final DomNodeList<DomNode> divs = page.querySelectorAll("div");
        for (DomNode div : divs) {
            ....
        }

        //get div which has the id 'breadcrumbs'
        final DomNode div = page.querySelector("div#breadcrumbs");
    }
}