Monday, March 2, 2009

Database hangs

Basically Database hangs are of the following categories:
  1. Hung database
  2. Hung session(s)
  3. Overall instance/database performance
  4. Query performance
We will examine only the hung database, as that is related to our point of interest.

Oracle Support defines a "true" database hang as "an internal deadlock or a cyclical dependency between two or more processes."
When dealing with DML locks (that is, enqueue type TM), Oracle is able to detect this dependency and roll back one of the processes to break the cyclical condition.
On the other hand, when this situation occurs with internal kernel-level resources (such as latches or pins), Oracle is usually unable to automatically detect and resolve the deadlock.
If you encounter a database hang situation, you need to take system state dumps so that Oracle Support can begin to diagnose the root cause of the problem.
Whenever you take such dumps for a hang, it is important to take at least three of them a few minutes apart, on all instances of your database.
That way, evidence shows whether a resource is still being held from one time to the next.

RAC databases have more than one instance using the same set of resources, and a resource may be requested by more than one instance.
Resource sharing is well managed by Global Cache Services (GCS) and Global Enqueue Services (GES).
However, in some cases, the resource management operations could run into a deadlock situation and the entire database may hang because of serialization issues.

Thus I'm not seeing anything that you can do through your product to detect and resolve the hung.

MetaLink Note:175006.1 - Steps to generate HANGANALYZE trace files
Metalink Note:215858.1 - Interpreting HANGANALYZE trace files to diagnose hanging and performance problems

No comments:

Post a Comment