One Long and Nasty Recovery

One of the occupational hazards of being a production DBA is the proverbial 2am call when the production Oracle database crashes.

Oracle has steadily gotten better at self-correcting the issues that cause crashes, and in a well maintained environment with properly sized servers, UPS power, solid backups and Change Control, such outages are mercifully rare.

When they do occur however, it will quickly expose hitherto unforeseen gaps in the production support plan and test the professionalism of all concerned.

The following is a summary of a recent production outage, the symptoms, the causes and the resolutions. Although Oracle’s Metalink system and Oracle Support Services assisted in the identification of the solution, a great deal of work was required from the production DBA staff, and so some of these notes may prove useful for others either as study or worst-case, as verification for anyone facing the same scenario.

Please note for reasons of confidentiality some logs have been edited to obscure client and machine names. Some timings are from memory.

The call actually came shortly after midnight on 5th December. The production database had crashed. Oracle would start up and open normally, but then immediate crash again with an “end of file on communication channel”.

Investigating the alert log we found the following:

Errors in file /opt/oracle/proddb/admin/bdump/proddb_smon_19850.trc:
ORA-07445: exception encountered: core dump [kghstack_err()+100] [SIGBUS] [Invalid address alignment] [0xC302010F05C40A37] [] []
ORA-00607: Internal error occurred while making a change to a data block
ORA-00602: internal programming exception
ORA-07445: exception encountered: core dump [kghstack_err()+100]
[SIGBUS] [Invalid address alignment] [0xC302010F05C40A37] [] []
ORA-00607: Internal error occurred while making a change to a
data block
ORA-00602: internal programming exception
ORA-07445: exception encountered: core dump [kghstack_err()+100]
[SIGBUS] [Invalid address alignment] [0xC302010F05C40A37] [] []
ORA-00600: internal error code, arguments: [kghstack_free2], [],
[], [], [], [], [], []
Fri Dec  4 23:58:16 2009
Errors in file /opt/oracle/proddb/admin/bdump/proddb_pmon_19828.trc:
ORA-00474: SMON process terminated with error

 
These errors indicate that Oracle is trying to apply UNDO to a dirty block. Oracle writes uncommitted data to disk as part of the checkpoint process, so at any point in time there may be uncommitted changes in the Oracle data files. Once a commit occurs, the changes are considered permanent. If a rollback is issued, the changes are undone using the data from the UNDO segments.

When a database crashes, the DBA performs recovery on the data files to make them consistent with the control files. This allows the database to open. Once the database is open, if there is pending UNDO from an uncommitted transaction, Oracle applies the undo to rollback the uncommitted changes.

These errors showed that Oracle was failing during the application of the UNDO.

To confirm this theory we added the following to the INIT.ORA file of the database:

event="10513 trace name context forever, level 2"

This event disables transaction recovery, so the pending UNDO is never applied. The database would now open and stay open, but it means that there are data blocks with uncommitted data in them, and this state is not suitable for continued production processing.

The question therefore was why was the application of UNDO causing the database to crash? The obvious answer was that the UNDO itself was corrupted in some way, but a DBV scan of the undo data file showed it to be clean. SYSTEM was also checked and found to be clean. If it was corruption, the problem lay elsewhere.

The database in question is 1.1Tb in size and consists of 87 data files. A sequential execution of DBV against each file would take hours to complete and this database is a critical production system. The server the database resides on is a Sun Microsystems M5000 with 32 CPUs and 64Mb RAM, so it seemed prudent to leverage this computing power and reduce the time to verify the data files by running multiple parallel DBV sessions.

A quick SQL script was used to generate the parallel-DBV commands as follows:

SQL> spool dbv_parallel.ksh
SQL> set linesize 255
SQL> set pagesize 999
SQL> select
2    'nohup dbv blocksize=8192 file='||name||' logfile='||
3     substr(name,instr(name,'/',-1)+1,length(name))||
4     '.log &' "-- ksh command"
5  from v$datafile
6  where 1=1
7  and file# >= 1
8  and file# < 30;

-- ksh command
---------------------------------------------------------------------------------
nohup dbv blocksize=8192 file=/opt/oradata/system_01.dbf logfile=system_01.dbf.log &
nohup dbv blocksize=8192 file=/opt/oradata/undotbs_01.dbf logfile=undotbs_01.dbf.log &
nohup dbv blocksize=8192 file=/opt/oradata/users_01.dbf logfile=users_01.dbf.log &
nohup dbv blocksize=8192 file=/opt/oradata/tools_01.dbf logfile=tools_01.dbf.log &

<output removed to aid clarity>

The SQL script above was spooled to a ksh script that uses the UNIX nohup command to launch multiple DBV processes, each one verifying a different data file and outputting to a separate log file.

The dbv_parallel.ksh script was then executed from the korn shell prompt and left to execute until all processes completed. The SQL script was executed again, this time for data files 31 through 60 and the processes repeated. Finally data files 61 through 87 were verified.

This technique reduced the verification time from several hours to less than 45 minutes.

Now we had 87 log files from 87 DBV sessions. All but nine of the log files were clean, but nine of the DBV logs contained entries such as the following:

DBVERIFY: Release 10.2.0.4.0 - Production on Sun Dec 6 23:02:18 2009
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
DBVERIFY - Verification starting : FILE = /opt/oracle/proddb/oradata/WIN_SSRUK_TABLE01.dbf
Block Checking: DBA = 100805828, Block Type = KTB-managed data block
data header at 0x100283abc
kdbchk: the amount of space used is not equal to block size
used=5469 fsc=0 avsp=2706 dtl=8000
Page 142532 failed with check code 6110
Block Checking: DBA = 100882367, Block Type = KTB-managed data block
data header at 0x100279aec

All of the corruption was in the WIN_SSRUK_TABLE tablespace, but it was very extensive covering several hundred blocks. This level of corruption usually suggests an underlying hardware problem, so the UNIX admin team was asked to verify the operation of the disk subsystem.

The disk system was checked and found to be clean, and given the extent of the corruption, and that we were unable to open the database without using the event shown earlier, it was decided to recover the whole database from the previous day’s backup and roll forward to before the time of the crash.

This recover was done, taking several hours to complete. When the database was open again it did so without further errors and users were allowed on again. It was not long however before further problems were reported from the user community with transactions failing with ORA-600 errors.

A further check of the data files with DBV showed the corruption was back.

We then recovered our backup to a different server and did not perform any roll forward. This test showed that the corruption had been present even before last night’s backup. We then checked backups for two nights and three nights prior and all with the same result. It was now clear this corruption had been present for some time.

The database was backed up using SQL Backtrack which is capable of detecting corrupt blocks in Oracle, but it would seem this type of corruption is not known to SQL Backtrack and had gone undetected for some time.

Several hours into the recovery our production database was still down and our backups were providing little value to solve this critical issue. We needed to recover the data from the corrupt objects and try to rebuild what we could.

At this point we knew which files had corruption but not which objects. The DBV log had reported the DBA (data block address) of the corruption, we now needed to convert these DBAs into object names.

In Metalink note 113005.1, Oracle provides the cdba.sql script to take a DBA and convert into a file ID and block ID. This can then be compared against the DBA_SEGMENTS view to get the object owner, name and type.

To streamline the process I updated Oracle’s code and added the object mapping into the process as follows:

CREATE OR REPLACE PROCEDURE cdba2 ( iblock VARCHAR2, imode VARCHAR2 ) AS
x       NUMBER;
digits# NUMBER;
results NUMBER := 0;
file#   NUMBER := 0;
block#  NUMBER := 0;
cur_digit CHAR(1);
cur_digit# NUMBER;
cursor crDBA is
SELECT
  tablespace_name,
  segment_type,
  owner,
  segment_name
FROM dba_extents
WHERE file_id = file#
and block# between block_id
AND block_id + blocks - 1;

BEGIN
  IF upper(imode) = 'H' THEN
    digits# := length( iblock );
    
    FOR x  IN 1..digits#  LOOP
      cur_digit := upper(substr( iblock, x, 1 ));
      IF cur_digit IN ('A','B','C','D','E','F') THEN
        cur_digit# := ascii(cur_digit)-ascii('A')+10;
      ELSE
        cur_digit# := to_number(cur_digit);
      END IF;
      results := (results *16) + cur_digit#;
    END LOOP;
  ELSE
    IF upper(imode) = 'D' THEN
      results := to_number(iblock);
    ELSE
      dbms_output.put_line('H=Hex,D=Decimal');
      RETURN;
    END IF;
  END IF;

  file#  := dbms_utility.data_block_address_file(results);
  block# := dbms_utility.data_block_address_block(results);

  dbms_output.put_line('.');
  dbms_output.put_line( 'The file is ' || file# );
  dbms_output.put_line( 'The block is ' || block# );

  for recDBA in crDBA loop
    dbms_output.put_line('.');
    dbms_output.put_line('The object is '||recDBA.owner||'.'||recDBA.segment_name );
  end loop;
END;

Now all we needed was the DBAs from the DBV logs – all nine of them. To extract them a little UNIX command line magic was used as follows:

cat WIN_SSRUK_TABLE01.dbf.log | grep DBA | awk {'print $5'} |
  cut -d"," -f1 > file1.txt
cat WIN_SSRUK_TABLE02.dbf.log | grep DBA | awk {'print $5'} |
  cut -d"," -f1 > file2.txt
cat WIN_SSRUK_TABLE03.dbf.log | grep DBA | awk {'print $5'} |
  cut -d"," -f1 > file3.txt
cat WIN_SSRUK_TABLE04.dbf.log | grep DBA | awk {'print $5'} |
  cut -d"," -f1 > file4.txt
cat WIN_SSRUK_TABLE05.dbf.log | grep DBA | awk {'print $5'} |
  cut -d"," -f1 > file5.txt
cat WIN_SSRUK_TABLE06.dbf.log | grep DBA | awk {'print $5'} |
  cut -d"," -f1 > file6.txt
cat WIN_SSRUK_TABLE07.dbf.log | grep DBA | awk {'print $5'} |
  cut -d"," -f1 > file7.txt
cat WIN_SSRUK_TABLE08.dbf.log | grep DBA | awk {'print $5'} |
  cut -d"," -f1 > file8.txt
cat WIN_SSRUK_TABLE09.dbf.log | grep DBA | awk {'print $5'} |
  cut -d"," -f1 > file9.txt

This produced nine files, each one with a list of the DBAs from each of the nine corrupted data files. The following shows the first five lines of file1.txt

$ cat file1.txt
100805828
100882367
100891553
100897827
100901462

Now we needed a complete SQL script to call the cdba PL/SQL procedure for each of the DBAs. To generate this each list of DBAs was processed as follows:

cat file1.txt | awk {'print "exec cdba2("$1",D);" '} > call.sql
cat file2.txt | awk {'print "exec cdba2("$1",D);" '} >> call.sql
cat file3.txt | awk {'print "exec cdba2("$1",D);" '} >> call.sql
cat file4.txt | awk {'print "exec cdba2("$1",D);" '} >> call.sql
cat file5.txt | awk {'print "exec cdba2("$1",D);" '} >> call.sql
cat file6.txt | awk {'print "exec cdba2("$1",D);" '} >> call.sql
cat file7.txt | awk {'print "exec cdba2("$1",D);" '} >> call.sql
cat file8.txt | awk {'print "exec cdba2("$1",D);" '} >> call.sql
cat file9.txt | awk {'print "exec cdba2("$1",D);" '} >> call.sql

The final call.sql script contained hundreds of calls to cdba2, one for each corrupted DBA. The following is the first five lines of the finalized script:

exec cdba2(100805828,'D');
exec cdba2(100882367,'D');
exec cdba2(100891553,'D');
exec cdba2(100897827,'D');
exec cdba2(100901462,'D');

My awk skills are not sufficient to know how to get the ‘D’ in the output line, so I used the vi editor’s s/D/\’D\’/g command to complete the script.

The script was executed and revealed that only one object was affected by the corruption, a table called MATCH_QUEUE_HEADER owned by the SSRUK user. However this table was very large at over 54 million rows and over 487,000 8Kb blocks.

To recover what data we could, the first step was to allow Oracle to skip the corrupt blocks:

SQL> exec DBMS_REPAIR.SKIP_CORRUPT_BLOCKS('SSRUK',
  'MATCH_QUEUE_HEADER');

We then created a table that matched the structure of the original MATCH_QUEUE_HEADER but using a different table space:

SQL> create table TUSC_MATCH_QUEUE_HEADER
tablespace WIN_SSRUK_AAS_TABLE as
( select * from SSRUK.MATCH_QUEUE_HEADER where 2=1 );

Time was pressing and we needed to complete the recovery as quickly as possible. Therefore we decided to use techniques to make the recovery of data faster. Since the database was not fully recoverable anyway, these techniques did not compromise our recoverability any further, and a full backup was going to be needed upon completion anyway:

SQL> alter table TUSC_MATCH_QUEUE_HEADER nologging;
SQL> insert /*+ append */ into TUSC_MATCH_QUEUE_HEADER
( select /*+ parallel (a,32,1) */ * from
  SSRUK.MATCH_QUEUE_HEADER a );

This technique inserted 54,530,067 records in under five minutes. We were able to recover all but a few dozen records from the original table.

However the work was still not complete. There were grants and indexes from the original table that needed to be copied as well. The following script was used to capture the grants from the original table:

select
  'grant '||privilege||' on '||owner||'.'||
  table_name||' to '||grantee||
decode(grantable,'NO',';',' with grant option;')
from dba_tab_privs
where 1=1
and owner = 'SSRUK'
and table_name = 'MATCH_QUEUE_HEADER';

This script generated the following output:

grant DELETE on SSRUK.MATCH_QUEUE_HEADER to ROLE_SSRUK;
grant INSERT on SSRUK.MATCH_QUEUE_HEADER to ROLE_SSRUK;
grant SELECT on SSRUK.MATCH_QUEUE_HEADER to ROLE_SSRUK;
grant UPDATE on SSRUK.MATCH_QUEUE_HEADER to ROLE_SSRUK;
grant SELECT on SSRUK.MATCH_QUEUE_HEADER to NTUKPROC;
grant SELECT on SSRUK.MATCH_QUEUE_HEADER to NTUK_READ;

The indexes were manually recreated by inspecting the existing table. They were as follows:

ALTER TABLE match_queue_header
  ADD CONSTRAINT match_queue_header_ind_key
  PRIMARY KEY (queue_id, proposed_match_no)
  USING INDEX TABLESPACE win_index;

CREATE INDEX match_queue_headerixa ON
  match_queue_header (corr_acc_no, queue_id, pass_id)
  TABLESPACE win_index parallel 32;

CREATE INDEX match_queue_headerixb ON
  match_queue_header (event_group_id)
  TABLESPACE win_index parallel 32;

CREATE UNIQUE INDEX match_queue_headerixc ON
  match_queue_header
  (real_match_no, corr_acc_no, proposed_match_no, queue_id)
  TABLESPACE win_index parallel 32;

The original table was dropped and the new one renamed to take the place of the old table. Then the index create script was executed. All was going well until the last index blew out with an ORA-600 error.

A review of the trace file showed the following:

*** 2009-12-07 05:12:40.045
*** ACTION NAME:() 2009-12-07 05:12:40.044
*** MODULE NAME:(SQL*Plus) 2009-12-07 05:12:40.044
*** SERVICE NAME:(SYS$USERS) 2009-12-07 05:12:40.044
*** SESSION ID:(828.43) 2009-12-07 05:12:40.044
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x1070b4b000001c3, PC: [0x1021a9840, smboQb
Nxt()+96]
Registers:
----------
%o0: 0xffffffff7d7916f0 %o1: 0x000000000000004b %o2: 0x01070b4b000001c5
%o3: 0x0000000000000007 %o4: 0x00000000000000c3 %o5: 0x0000000000000001
%o6: 0xffffffff7ffeae10 %o7: 0x00000001021a1c94
%g1: 0x0000000000000001 %g2: 0xffffffff7d7bbbe9 %g3: 0xffffffff7d791760
%g4: 0x0000000000000000 %g5: 0x01070b4b000001c3 %g6: 0x0000000000000000
%g7: 0xffffffff7d500200
%pc: 0x00000001021a9840 %npc: 0x00000001021a9844 %y: 0x0000000000000000
Stack info:
----------
ss_sp: 0xffffffff7e000000 ss_size: 0x0000000002000000 ss_flags: 0
*** 2009-12-07 05:12:40.049
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [smboQbNxt()+96]
[SIGSEGV] [Address not mapped to object] [0x1070B4B000001C3]
[] []
Current SQL statement for this session:
CREATE UNIQUE INDEX match_queue_headerixc

This new error – ORA-07445 with argument smboQbNxt – was looked up in My Oracle Support. Metalink note 400745.1 pointed to bug 4655998 which is a bug with a new sorting algorithm introduced in 10g. The fix is to upgrade to 10.2.0.4 which is not going to happen in this type of production-down scenario.

However, a second work around was to issue the following:

SQL> alter session set "_newsort_enabled" = false;

Now the index created without further error, and all four indexes were built in about 15 minutes.

The grants script was executed and related synonyms were recompiled. Finally we wanted to gather statistics on the new object as follows:

exec dbms_stats.gather_table_stats(ownname=>'SSRUK',
  tabname=>'MATCH_QUEUE_HEADER',
  method_opt=>'FOR ALL INDEXED COLUMNS SIZE 1',
  cascade=>TRUE, degree=>32);

The analysis ran for about two minutes before:

BEGIN dbms_stats.gather_table_stats(ownname=>'SSRUK',tabname=>'MATCH_QUEUE_HEADER',estimate_percent=>35, method_opt=>'FOR ALL INDEXED COLUMNS SIZE 1',cascade=>FALSE); END;
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel

A process of elimination followed and we found by removing the estimate_percent clause the statistics gathering completed without further error.

The total outage from first issue to final resolution was about 36 hours. This included several test recoveries which determined the backups were not going to help, and looking up the various ORA-600 and ORA-7445 errors that occurred along the way.

For reasons of confidentiality I cannot name the client who’s system this occurred on, but I will say the recovery was a team effort and I extend my thanks for Liju, John, Tony and Santi as well as numerous members of the UNIX, SAN and Backup teams for their assistance in this Oracle recovery.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s