Frequently Asked Questions

  1. I've found a bug, how do I report it?
  2. HtmlUnit doesn't support feature X. How do I request this feature?
  3. What are the most important configurations?
  4. How to accept an old or bad website certificate?
  5. HtmlElement.click() does not triggering anything (e.g. no Javascript event).
  6. When launching my WebClient, I am ending with an js error. But when running on browser, we hit the error but keep on. In HtmlUnit, this is where things come to a halt.
  7. Nothing happens when using HtmlUnit with AJAX, although page works correctly in browsers. What's wrong?
  8. How to modify the outgoing request or incoming response?
  9. How to modify the JavaScript code before it is processed?
  10. How to parse html content given as string?
  11. I'm having problems with cookie support.
  12. I get error messages about redirection being disabled but I've turned it on.
  13. There are many errors reported by the DefaultCssErrorHandler inside my log output. Can I ignore/disable these messages?
  14. What version will feature X be in?
  15. HtmlUnit appears to be leaking memory; what's the deal?
  16. When I run the unit tests with HtmlUnit and code coverage enabled I get a MethodTooLargeException.
I've found a bug, how do I report it?

The best way is to open a bug report in the GitHub issue tracker. Sending a bug report to the mailing list runs the risk that the bug may get lost. Putting it in the bug database guarantees that it won't. Please search the bug database to see if your bug hasn't already been reported before opening a new one.

If you know how to fix the problem then pull requests are gratefully accepted.

[top]


HtmlUnit doesn't support feature X. How do I request this feature?

The best way is to open a bug report in the GitHub issue tracker. See the answer on bug reporting for why the database is preferred.

If you're willing to write the feature yourself, you can always send us a pull requests.

[top]


What are the most important configurations?

    webClient.getOptions().setJavaScriptEnabled(...);
    webClient.getOptions().setThrowExceptionOnFailingStatusCode(...);
    webClient.getOptions().setThrowExceptionOnScriptError(...);
    webClient.getOptions().getProxyConfig().set...();
            

[top]


How to accept an old or bad website certificate?

    webClient.getOptions().setUseInsecureSSL(true);
            

This can also help if you use a proxy like Charles or Fiddler to analyze traffic.

[top]


HtmlElement.click() does not triggering anything (e.g. no Javascript event).

The click() processes the click only if the element is visible. Please check the log output - if the element is not visible there should be a warning in the log like "Calling click() ignored because the target element 'xxxx' is not displayed." or "Calling click() ignored because the target element 'xxxx' is disabled."

If this is the case then you can use a more advanced version of the click method to ignore the visibility checks

click(boolean shiftKey, boolean ctrlKey, boolean altKey, boolean triggerMouseEvents, boolean ignoreVisibility, boolean disableProcessLabelAfterBubbling)
e.g.
  anchor.click(false, false, false, true, true, false);

[top]


When launching my WebClient, I am ending with an js error. But when running on browser, we hit the error but keep on. In HtmlUnit, this is where things come to a halt.

Because HtmlUnit was designed as testing tool the initial WebClient setup forces this behavior (in contrast to real browsers trying ot eat each and every page/javascript code). You can change this by

webClient.getOptions().setThrowExceptionOnScriptError(false);

[top]


Nothing happens when using HtmlUnit with AJAX, although page works correctly in browsers. What's wrong?

The main thread using HtmlUnit may be finishing execution before allowing background threads to run. You have a couple of options:

  1. webClient.setAjaxController(new NicelyResynchronizingAjaxController()); will tell your WebClient instance to re-synchronize asynchronous XHR.
  2. webClient.waitForBackgroundJavaScript(10000); or webClient.waitForBackgroundJavaScriptStartingBefore(10000); just after getting the page and before manipulating it.
  3. Explicitly wait for a condition that is expected be fulfilled when your JavaScript runs, e.g.
    
            //try 20 times to wait .5 second each for filling the page.
            for (int i = 0; i < 20; i++) {
                if (condition_to_happen_after_js_execution) {
                    break;
                }
                synchronized (page) {
                    page.wait(500);
                }
            }
                

[top]


How to modify the outgoing request or incoming response?

You can subclass WebConnectionWrapper and override getResponse() as:


    try (final WebClient webClient = new WebClient()) {
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        // set more options

        // create a WebConnectionWrapper with an (subclassed) getResponse() impl
        new WebConnectionWrapper(webClient) {

            public WebResponse getResponse(WebRequest request) throws IOException {
                WebResponse response = super.getResponse(request);
                if (request.getUrl().toExternalForm().contains("my_url")) {
                    String content = response.getContentAsString();

                    // intercept and/or change content
                    ....

                    // construct replacement response
                    WebResponseData data = new WebResponseData(content.getBytes(response.getContentCharset()),
                            response.getStatusCode(), response.getStatusMessage(), response.getResponseHeaders());
                    response = new WebResponse(data, request, response.getLoadTime());
                }
                return response;
            }
        };

        // use the client as usual
        HtmlPage page = webClient.getPage(uri);
    }
            

[top]


How to modify the JavaScript code before it is processed?

You can register you own implementation of a ScriptPreProcessor as:


    // create a ScriptPreProcessor
    final ScriptPreProcessor myScriptPreProcessor = new ScriptPreProcessor() {

        @Override
        public String preProcess(final HtmlPage htmlPage, final String sourceCode, final String sourceName,
                final int lineNumber, final HtmlElement htmlElement) {

            // modify the source code here

            return sourceCode;
        }
    };

    try (WebClient webClient = new WebClient()) {
        // activate the ScriptPreProcessor
        webClient.setScriptPreProcessor(myScriptPreProcessor);

        // use the client as usual
        final HtmlPage page = webClient.getPage(url);
    }
            

[top]


How to parse html content given as string?

Because HtmlUnit simulates real browsers you have to do some setup before parsing the string. You need a WebClient and an (arbitrary) url (used as base for relative links if there are any).

XHtml


        final String htmlCode = "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\""
                + "\"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\">"
                + "<html xmlns=\"http://www.w3.org/1999/xhtml\">"
                + "  <head>"
                + "    <title>Title</title>"
                + "  </head>"
                + "  <body>"
                + "    content..."
                + "  </body>"
                + "</html> ";
        try (WebClient webClient = new WebClient(browserVersion)) {
            final XHtmlPage page = webClient.loadXHtmlCodeIntoCurrentWindow(htmlCode);
            // work with the xhtml page
        }
            
Html

        final String htmlCode = "<html>"
                + "  <head>"
                + "    <title>Title</title>"
                + "  </head>"
                + "  <body>"
                + "    content..."
                + "  </body>"
                + "</html> ";
        try (WebClient webClient = new WebClient(browserVersion)) {
            final HtmlPage page = webClient.loadHtmlCodeIntoCurrentWindow(htmlCode);
            // work with the html page
        }
            

[top]


I'm having problems with cookie support.

HtmlUnit relies on HttpClient 4 for Cookie handling. If you have an issue, please post it to the HtmlUnit mailing list.

[top]


I get error messages about redirection being disabled but I've turned it on.
26.02.2003 16:07:05 org.apache.commons.httpclient.HttpMethodBase   processRedirectResponse
INFO: Redirect requested but followRedirects is disabled

It's an annoyance that I haven't figured out how to fix yet.

For a variety of reasons, I handle the redirection logic inside HtmlUnit rather than letting commons-httpclient handle it for me. It's commons-httpclient that is displaying that message because I have explicitly disabled its redirection support.

I'd like to filter out that warning message but haven't figured out a clean way of doing it. A number of people have pointed out that it's easy to disable a message if you know which logger is being used. The problem is that there isn't a way to disable the messages without knowing the logger in use.

[top]


There are many errors reported by the DefaultCssErrorHandler inside my log output. Can I ignore/disable these messages?

This is because our CSS parser detects some problem with one of the css resources your page relies on. Usually you will see similar error messages if you open your browser debug output window and enable the css log message.

At least if you are scraping web pages you can ignore this. To suppress the output use the SilentCssErrorHandler or implement your own CSSErrorHandler.

        try (WebClient webClient = new WebClient(browserVersion)) {
            webClient.setCssErrorHandler(new SilentCssErrorHandler());
            ....
        }
            

[top]


What version will feature X be in?

There is no "roadmap" of releases. Features will be added as they are written.

Changes to the product (including new features) are implemented by volunteers in their spare time. If feature X is important to you and nobody seems to be working on it then perhaps you should consider writing it yourself and submitting a pull requests.

[top]


HtmlUnit appears to be leaking memory; what's the deal?

Make sure

  1. that you are using the latest version of HtmlUnit, and
  2. that you are calling WebClient.close() (or use try-with-resources) when you are finished with your WebClient instance.

Like real browsers HtmlUnit maintains a history for every window to be able to support the back() action. If you don't need this you can disable the history by

        try (WebClient webClient = new WebClient(browserVersion)) {
            webClient.getOptions().setHistoryPageCacheLimit(0);
            webClient.getOptions().setHistorySizeLimit(0);
            ....
        }
              

[top]


When I run the unit tests with HtmlUnit and code coverage enabled I get a MethodTooLargeException.

Background: The CSS parser used by HtmlUnit is implemented based on the JavaCC parser generator. Because of the way JavaCC works, the (generated) parser consists of really huge class file (containing huge methods) nearly at the limits of the JVM class byte code. On the other hand Jacoco instruments the byte code (adds more operations) to keep track of the called methods. This is done with a more or less poor approach - without taking the limits of the JVM into account. As a result the instrumented bytecode exceeds the JVM limits and is no longer loadable.

For your tests try to exclude the CssParser (and maybe HtmlUnit itself also) from JaCoCo instrumentation. Usually you like to check the code coverage of your classes and not for HtmlUnit oder the CssParser.

The method to exclude from instrumentation depends on your build tool, there are different ways to do this for maven and gradle. Please check the documentation at JaCoCo at GitHub for more details. There are also several answers available on StackOverflow hot to do this. And please make sure you really exclude from instrumentation and not only from reporting.

[top]