Content

Introduction

The WebClient represent the browser if you work with HtmlUnit. To start using HtmlUnit you have to in instantiate a new WebClient - like starting the browser in the real world.

WebClient implements AutoCloseable; you should always use it with try-with-resources constructions. After a WebClient is closed (see WebClient.close()), any further use is not supported and might lead to exceptions or incorrect behaviour.

try (final WebClient webClient = new WebClient()) {
    // now you have a running browser, and you can start doing real things
    // like going to a web page
    final HtmlPage page = webClient.getPage("https://www.htmlunit.org/");
}

Imitating a specific browser

Often you will want to simulate a specific browser. This is done by passing a org.htmlunit.BrowserVersion into the WebClient constructor. Constants have been provided for some common browsers.

@Test
public void homePage_Firefox() throws Exception {
    try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) {
        final HtmlPage page = webClient.getPage("https://www.htmlunit.org/");
        Assert.assertEquals("HtmlUnit – Welcome to HtmlUnit", page.getTitleText());
    }
}

Specifying this BrowserVersion will change

  • the user agent HTTP header,
  • the values and the order of many other HTTP headers,
  • the list of supported mime types,
  • the behavior of the web client,
  • the supported javascript methods and and also the behaviour of some js functions,
  • the behavior of the web client, and
  • the default values for various css properties
to match the real browsers.

In most cases, it should be sufficient to use the predefined BrowserVersion constants.

Using the options to adjust the browser

There are various options available to make fine grained adjustments to the browser.

@Test
public void homePage_Firefox() throws Exception {
    try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) {
         // disable javascript
         webClient.getOptions().setJavaScriptEnabled(false);
         // disable css support
         webClient.getOptions().setCssEnabled(false);

        final HtmlPage page = webClient.getPage("https://www.htmlunit.org/");
        Assert.assertEquals("HtmlUnit – Welcome to HtmlUnit", page.getTitleText());
    }
}

The default values for most options are similar to the default values of real browsers - but (as always) there is one important exception:
HtmlUnit stops the Javascript execution at the first unhandled exception - Browsers do not stop. You can change this by changing the throwExceptionOnScriptError option to false.

@Test
public void homePage_Firefox() throws Exception {
    try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) {
         // proceed with the js execution on unhandled js errors
         webClient.getOptions().setThrowExceptionOnScriptError(false);

        final HtmlPage page = webClient.getPage("https://www.htmlunit.org/");
        Assert.assertEquals("HtmlUnit – Welcome to HtmlUnit", page.getTitleText());
    }
}

For a complete list and more details please have a look at the WebClientOptions API.

Change the browser language / time zone

Changing the language/time zone cannot be done from the options, it must be done before the WebClient is created.
All Browser Versions are shipped with 'en-US' as language and 'America/New_York' as timezone.
To change these default settings, a customised copy of the corresponding BrowserVersion must be created using the BrowserVersionBuilder. This new BrowserVersion can then be used to create a WebClient.

final BrowserVersion.BrowserVersionBuilder builder = new BrowserVersion.BrowserVersionBuilder(BrowserVersion.FIREFOX);

builder.setSystemTimezone(TimeZone.getTimeZone("Europe/Berlin"));
builder.setBrowserLanguage("de-DE");
builder.setAcceptLanguageHeader("de-DE,de");

final BrowserVersion germanFirefox = builder.build();
try (final WebClient webClient = new WebClient(germanFirefox)) {
    ....

There is no support for changing the language/timezone after the WebClient has been created.
For more details please have a look at the BrowserVersion.BrowserVersionBuilder API.

Change the browser user agent

Changing the user agent is similar to changing language/time zone (see above).
You have to create a customised copy of the corresponding BrowserVersion using the BrowserVersionBuilder. This adapted BrowserVersion can then be used to create a WebClient.

final BrowserVersion.BrowserVersionBuilder builder = new BrowserVersion.BrowserVersionBuilder(BrowserVersion.FIREFOX);

builder.setUserAgent("Mozilla/5.0 (iPhone; CPU iPhone OS 14_5 like Mac OS X) "
        + "AppleWebKit/605.1.15 (KHTML, like Gecko) FxiOS/128.0 Mobile/15E148 Safari/605.1.15");

final BrowserVersion iosFirefox = builder.build();
try (WebClient webClient = new WebClient(iosFirefox)) {
    ....

For more details please have a look at the BrowserVersion.BrowserVersionBuilder API.

Using HtmlUnit behind a proxy

Using a http proxy

There is a special WebClient constructor that allows you to specify proxy server information in those cases where you need to connect through one.

@Test
public void homePage_proxy() throws Exception {
    try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX, PROXY_HOST, PROXY_PORT)) {

        //set proxy username and password 
        final DefaultCredentialsProvider credentialsProvider = (DefaultCredentialsProvider) webClient.getCredentialsProvider();
        credentialsProvider.addCredentials("username", "password");

        final HtmlPage page = webClient.getPage("https://www.htmlunit.org");
        Assert.assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText());
    }
}

In case the proxy server requires credentials you can define them on the DefaultCredentialsProvider from the webClient

@Test
public void homePage_proxy() throws Exception {
    try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX, PROXY_HOST, PROXY_PORT)) {

        //set proxy username and password 
        final DefaultCredentialsProvider credentialsProvider = (DefaultCredentialsProvider) webClient.getCredentialsProvider();
        credentialsProvider.addCredentials("username", "password", PROXY_HOST, PROXY_PORT);

        final HtmlPage page = webClient.getPage("https://www.htmlunit.org");
        Assert.assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText());
    }
}

Socks proxy sample

The setup of socks proxies is a bit more tricky but in general follows the same pattern.

@Test
public void homePage_proxy() throws Exception {
    try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX) {
        // socks proxy / the true as last parameter marks this as socks proxy
        webClient.getOptions().setProxyConfig(new ProxyConfig(SOCKS_PROXY_HOST, SOCKS_PROXY_PORT, null, true));

        //set proxy username and password if required
        final DefaultCredentialsProvider credentialsProvider = (DefaultCredentialsProvider) webClient.getCredentialsProvider();
        credentialsProvider.addSocksCredentials("username", "password", SOCKS_PROXY_HOST, SOCKS_PROXY_PORT);

        final HtmlPage page = webClient.getPage("https://www.htmlunit.org");
        Assert.assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText());
    }
}

WebWindowListener / WebWindowEvents

If you wish to be notified when windows are created or pages are loaded, you need to register a WebWindowListener with the WebClient using WebClient.addWebWindowListener(WebWindowListener)

When a window is opened either by JavaScript or through the WebClient, a WebWindowEvent will be fired and passed into the WebWindowListener.webWindowOpened(WebWindowEvent) method. Note that both the new and old pages in the event will be null as the window does not have any content loaded at this point. If a URL was specified during creation of the window then the page will be loaded and another event will be fired as described below.

When a new page is loaded into a specific window, a WebWindowEvent will be fired and passed into the WebWindowListener.webWindowContentChanged(WebWindowEvent) method.

Using handlers

There are many handlers used by the WebClient for special purposes. These Handlers are implementing specific interfaces, and you are able to replace them with your own implementations. Default implementations are also available.

AlertHandler

The handler to be used to process JavaScript alerts triggered when the JavaScript method Window.alert() is called.

ConfirmHandler

The handler for the JavaScript function window.confirm().

PromptHandler

The handler for the JavaScript function window.prompt().

StatusHandler

A handler for changes to window.status.

AttachmentHandler

A handler for attachments, which represent pages received from the server which contain Content-Disposition=attachment headers.

ClipboardHandler

A handler for clipboard access.

PrintHandler

A handler for providing Window.print() implementations.

WebStartHandler

A handler for webstart support.

FrameContentHandler

A handler to make a decision to load the frame content or not.

CSSErrorHandler

For CSS parser error processing.

OnbeforeunloadHandler

RefreshHandler

A handler for page refreshes.

Polyfills

The number of javascript API's supported by the browsers seems to increase every day. Because of the limited development resources of the HtmlUnit project, being on track with this is really hard.
But there are already many polyfills available (to add API support for older borwsers). The idea is to use some of these polyfills to add the missing API's.
Starting with version 2.59.0 HtmlUnit supports the integration of polyfills; there is a dedicated option for every supported polyfill (disabled per default) and if enabled, the polyfill is automatically loaded.

@Test
public void fetchSupport() throws Exception {
    try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) {
         // enable fetch api polyfill
         webClient.getOptions().setFetchPolyfillEnabled(true);

        final HtmlPage page = webClient.getPage(....);
    }
}

Fetch API Polyfill

window.fetch polyfill

webClient.getOptions().setFetchPolyfillEnabled(true);