oracle DBMS_STATS.GATHER_TABLE_STATS

The Oracle RDBMS allows you to collect statistics of many different kinds as an aid to improving performance. DBMS_STATS package is concerned with optimizer statistics only. Given that Oracle sets automatic statistics collection of this kind on by default, DBMS_STATS package is intended for only specialized cases.

For GATHER_TABLE_STATS

Most enterprise databases, Oracle included, use a cost-based optimizer to determine the appropriate query plan for a given SQL statement. This means that the optimizer uses information about the data to determine how to execute a query rather than relying on rules (this is what the older rule-based optimizer did).

For example, imagine a table for a simple bug-tracking application

CREATE TABLE issues (

  issue_id number primary key,

  issue_text clob,

  issue_status varchar2(10)

);

CREATE INDEX idx_issue_status

    ON issues( issue_status );

If I’m a large company, I might have 1 million rows in this table. Of those, 100 have an issue_status of ACTIVE, 10,000 have an issue_status of QUEUED, and 989,900 have a status of COMPLETE. If I want to run a query against the table to find my active issues

SELECT *

  FROM issues

WHERE issue_status = ‘ACTIVE’

the optimizer has a choice. It can either use the index on issue_status and then do a single-row lookup in the table for each row in the index that matches or it can do a table scan on the issues table. Which plan is more efficient will depend on the data that is in the table. If Oracle expects the query to return a small fraction of the data in the table, using the index would be more efficient. If Oracle expects the query to return a substantial fraction of the data in the table, a table scan would be more efficient.

DBMS_STATS.GATHER_TABLE_STATS is what gathers the statistics that allow Oracle to make this determination. It tells Oracle that there are roughly 1 million rows in the table, that there are 3 distinct values for the issue_status column, and that the data is unevenly distributed. So Oracle knows to use an index for the query to find all the active issues. But it also knows that when you turn around and try to look for all the closed issues

SELECT *

  FROM issues

WHERE issue_status = ‘CLOSED’

that it will be more efficient to do a table scan.

Gathering statistics allows the query plans to change over time as the data volumes and data distributions change. When you first install the issue tracker, you’ll have very few COMPLETED issues and more ACTIVE and QUEUED issues. Over time, the number of COMPLETED issues rises much more quickly. As you get more rows in the table and the relative fraction of those rows that are in the various statuses change, the query plans will change so that, in the ideal world, you always get the most efficient plan possible.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s