CA Clarity - Find and Kill All Sessions Running a Portlet That Consumes Too Many Resources

CA Clarity is very powerful and flexible. Custom portlets can be created that can perform very complex dynamic queries. These queries can sometimes consume too many resources. This can happen for a variety of reasons including:

  • Bad SQL (cartesian join)
  • Bad Execution Plan (stale statistics or other factors)
  • Lack of Indexes (on custom ODF attributes)
  • Too many users hitting the portlet at the same time (i.e. start of the day, Friday etc)

We'll see the issue manifest itself in a couple of way such as:

  1. iowait > 10%
  2. CPU on one or more CPUs sustained > 50%

When this situation occurs and the culprit is a portlet killing it is safe; Clarity will recover. Note that killing BG process and other queries producted from the Clarity application may not be safe so check with CA Support.

Killing the sessions in the case of our runaway portlet the fastest thing to do to return the system to normal operation.

Note also that killing the query doesn't fix the problem, it buys you time to get to the root cause.

Here's the steps you need to follow to bring the situation under control:

Find the hash value of the SQL statement (Using AWR or Statspack or OEM or Quest SpotLight).

In our example let's assume this hash is

3584585137

Use this SQL statemet to find the sessions from the app server that are running the query

SELECT DISTINCT 'ALTER SYSTEM KILL  SESSION ' 
 || CHR(39) 
 || SID 
 || ', ' 
 || SERIAL# 
 || CHR(39) 
 || ' IMMEDIATE;' STMT  
 FROM V$SESSION A,V$SQL B 
WHERE A.SQL_HASH_VALUE=B.HASH_VALUE 
  AND B.OLD_HASH_VALUE = '3584585137';

Save these statements and run them in SQL Devleoper, Spotlight, Toad or SQL Plus.

Note that any inactive or killed sessions will error out, ignore those errors. Killing a session that is already dead won't hurt.

You can also use the anonymous PL/SQL block

DECLARE
BEGIN     
FOR r_INDEX IN 
 (  

    SELECT DISTINCT 'ALTER SYSTEM KILL  SESSION ' 
     || CHR(39) 
     || SID || ', ' 
     || SERIAL# 
     || CHR(39) 
     || ' IMMEDIATE' STMT  
    FROM V$SESSION A,V$SQL B 
    WHERE A.SQL_HASH_VALUE=B.HASH_VALUE 
      AND B.OLD_HASH_VALUE = '3584585137' AND A.STATUS='ACTIVE'
  )    
  LOOP 
    DBMS_OUTPUT.PUT_LINE(r_INDEX.stmt);
    EXECUTE IMMEDIATE r_INDEX.stmt;
  END LOOP;                                                       
END;
/

Now that you've cleaned up the performance issue at consider doing one of the following to prevent it in the future:

  1. These recommendations are in order by likely root cause and ease of implementation
  2. Tune the portlet SQL (USE explain plan/tkprof and refine the SQL add indexes etc)
  3. Check your stats and recalc more often or at the end of batch loads
  4. Create a Stored Outline

The best advice.....have your power users or consultants test new portlets in a "prod like" environment for performance problems BEFORE they put them into production. Make sure someone does an EXPLAIN PLAN on any SQL you put into Clarity.

What's "prod like" ?

  • Same record counts as PROD
  • Nearly the same hardware (RAM CPUs DISK)

Capture the SQL with a SQL TRACE on the Tomcat App Server if you can't get it any other way.

The time you take on these preventative steps will save you the cost and trouble of a Sunday night outage...