Wednesday, March 30, 2011

Tomcat CPU 100% Issue

In testing environment, we found tomcat (v6.0.18) CPU 100% under no load. To troubleshoot this issue, we first identified the thread took most of CPU using below 3 steps:
Step1 -- use top -H to get which java thread occupies the CPU mostly
Step2 -- use kill -3 to do thread dump
Step3 -- use Step1 java thread pid (Dec Format) to get the thread information from Step2 by nid (Hex format)

The runnable thread took 100% CPU is:
"http-8080-Poller-0" daemon prio=10 tid=0x00002aabd85b7400 nid=0x2dfe runnable [0x000000004b248000..0x000000004b248c90]
   java.lang.Thread.State: RUNNABLE
                at org.apache.tomcat.jni.Poll.poll(Native Method)

After Google, we found it was a bug in Tomcat APR native code. 

The jni native method poll() in poll.c does not check if the socket has no
timeout (i.e. it does not check if max_ttl is negative) before adjusting the
timeout passed to apr_pollset_poll(). 

The timeout passed to apr_pollset_poll() is in the variable 'ptime'.  The
current code sets ptime to zero when there is a valid timeout (usually 2000ms)
and no socket timeout (usually -1000).

These values are common because defaults to pollTime=2000 ms
and soTimeout=-1 sec (which gets adjusted to -1000 ms).
When a zero timeout is passed to apr_pollset_poll(), it is passed along to the
native select() as a zero timeval which causes select() to return immediately
instead of blocking. A negative value should be passed to apr_pollset_poll() for
an infinite timeout, but I don't think that's what we want here.

The observed result is 100% cpu use when Tomcat is idle because calls to poll()
are made continuously. These calls return immediately.

The solution is to recognize when max_ttl is negative, and if so; do not use it
to adjust the 'ptime' variable passed as an argument to poll().  Here is a patch

APR (Apache Portable Runtime) is tomcat native library which is tomcat core to provide OS/native process functionalities. Tomcat has 3 connector types: HTTP, AJP and APR. Usually we use http and have following configurations

Tomcat server.xml default configuration has below sample:

    <!-- A "Connector" represents an endpoint by which requests are received
         and responses are returned. Documentation at :
         Java HTTP Connector: /docs/config/http.html (blocking & non-blocking) (JIO/NIO)
         Java AJP  Connector: /docs/config/ajp.html
         APR (HTTP/AJP) Connector: /docs/apr.html
         Define a non-SSL HTTP/1.1 Connector on port 8080
    <Connector port="8080" protocol="HTTP/1.1"
               redirectPort="8443" />

On our testing server, we have tomcat 6.0.18 and native library 1.1.14 installed. And if we don't specify protocol, it seems to default to use APR with below configuration (no protocol specified)

<Connector port="8080" maxHttpHeaderSize="8192"
maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
enableLookups="false" redirectPort="8443" acceptCount="100"
connectionTimeout="0" disableUploadTimeout="true"       
compression="on" compressionMinSize="1024" URIEncoding="UTF-8"
noCompressionUserAgents="gozilla, traviata"

Usually we need specify protocol="org.apache.coyote.http11.Http11AprProtocol" for APR connector. If we don't have native library installed, we usually see below INFO when start tomcat.

2006-06-05 19:50:26 org.apache.catalina.core.AprLifecycleListener
INFO: The Apache Tomcat Native library which allows optimal performance in
production environments was not found on the
java.library.path: /usr/local/opt/sun-jdk-

Updated 4/25/2012
For some system not supporting top -H, we can use ps command which has -m or -L to list out thread info. Examples:
ps H -eo user,pid,ppid,tid,time,%cpu,cmd -p 17545 --sort=%cpu
For sun Linux, use ps -efL, -L option will show the LWPs
Also try prstat -L -v -u weblogic
ps -emo pid,tid,class,rtprio,stat,comm ps -eL 18975

1 comment:


    This is a good post to explain tomcat connectors. For details, the "Tomcat: the definite guide" book provides more info.