Request headers

HtmlUnit mimics the browser as close as possible, of course this includes the sent request headers also. But you can change this if needed at three levels; the request level, the client level and the BrowserVersion level.

BrowserVersion level

To change the request header at the BrowserVersion level you have to create your own customized browser version using the BrowserVersionBuilder.

final BrowserVersion browser =
    new BrowserVersion.BrowserVersionBuilder(BrowserVersion.FIREFOX)
          .setAcceptLanguageHeader("de-CH")
          .build();

final WebClient webClient = new WebClient(browser);
....

There are many methods available to customize basic browser behavior like

  • setApplicationCodeName(String)
  • setApplicationMinorVersion(String)
  • setApplicationName(String)
  • setApplicationVersion(String)
  • setBuildId(String)
  • setPlatform(String)
  • setSystemLanguage(String)
  • setSystemTimezone(TimeZone)
  • setUserAgent(String)
  • setVendor(String)
  • setUserLanguage(String)
  • setBrowserLanguage(String)
  • setAcceptEncodingHeader(String)
  • setAcceptLanguageHeader(String)
  • setCssAcceptHeader(String)
  • setHtmlAcceptHeader(String)
  • setImgAcceptHeader(String)
  • setScriptAcceptHeader(String)
  • setXmlHttpRequestAcceptHeader(String)

WebClient level

To change the request header at the client level use WebClient.addRequestHeader(). You are able to add additional headers to every request made by this client or overwrite the default ones.
Example: add an addition header to every client request

client.addRequestHeader("from htmlunit", "yes");

Example: replace the default accept-language header for all request made by this client.

client.addRequestHeader(HttpHeader.ACCEPT_LANGUAGE, "fr");

Example: replace the default accept-language header for all request made by this client.

client.addRequestHeader(HttpHeader.ACCEPT_LANGUAGE, fromClient);

Example: replace the default accept-language header for all request made by this client.

client.addRequestHeader(HttpHeader.ACCEPT_LANGUAGE, fromClient);

Request level

It is also possible to add/overwrite a request header for a dedicated request. Example:

WebRequest wr = new WebRequest(URL_FIRST);
wr.setAdditionalHeader("from htmlunit", "yes");
....
client .getPage(wr);

Animations based on Window.requestAnimationFrame()

All browsers supporded by HtmlUnit are able to do animations based on the Window.requestAnimationFrame() API. A typical example for this is Chart.js. This kind of animation support is not triggered automatically because HtmlUnit is headless. The javascript part of the API is implemented but the user of the HtmlUnit library has to force the triggering of the callback(s).

Example:

try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) {
    HtmlPage page = webClient.getPage(uri);
    webClient.waitForBackgroundJavaScript(1_000);

    // page is loaded and async js done

    // we now processing the animation
    Window window = page.getEnclosingWindow().getScriptableObject();
    int i = 0; // to be able to limit the animation cycles
    do {
        i++;

        // force one animation cycle
        // this invokes all the animation callbacks registered for this
        // window (by calling requestAnimationFrame(Object)) once.
        int pendingFrames = window.animateAnimationsFrames();
    } while (pendingFrames > 0 && i < 200);
}

Based on this you have to full control over the animation, you can skip all, but you are also check the current page state after each single animation step.

Attachments

Normally pages are loaded inline: clicking on a link, for example, loads the linked page in the current window. Attached pages are different in that they are intended to be loaded outside of this flow: clicking on a link prompts the user to either save the linked page, or open it outside of the current window, but does not load the page in the current window.

HtmlUnit complies with the semantics described above when an AttachmentHandler has been registered with the com.gargoylesoftware.htmlunit.WebClient via com.gargoylesoftware.htmlunit.WebClient#setAttachmentHandler(AttachmentHandler). When no attachment handler has been registered with the WebClient, the semantics described above to not apply, and attachments are loaded inline. By default, AttachmentHandlers are not registered with new WebClient instances.

Multithreading/Threads Pooling

HtmlUnit uses an Executor backed by a CachedThreadPool for thread handling. This should work fine for common cases. The CachedThreadPool is in use since 2.54.0 to be able to support scenarios using many treads e.g. because of many WebSockets.

Starting with 2,45.0 you can change this by using WebClient.setExecutor(ExecutorService). It might be a good idea to also implement some thread naming to distinguish Threads used by HtmlUnit from the rest.