ORA-01552: cannot use system rollback segment for non-system tablespace

Database : Oracle Database 19.28 EE

Operating System : Redhat Linux 8.10 64 Bit

Database Type : Oracle RAC DB

Incident Summary:

One of the critical production applications started encountering intermittent failures with the error:

Application Log Error Message:

org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1590)

org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)

java.lang.Thread.run(Thread.java:750)

Caused by exception:

oracle.jdbc.OracleDatabaseException

Message:

OR***** cannot use system rollback segment for non-system tablespace 'APPTBS_A1'

Stacktrace:

oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:498)

oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:446)

oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1054)

oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:623)

oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:252)

The application was working without any issues on the existing 2-node Oracle RAC database. The issue was observed only after the RAC cluster was expanded from 2 nodes to 4 nodes by adding two additional instances. This is a 4-node Oracle RAC database environment, and the issue started immediately after the new nodes were introduced into the cluster. Application team confirmed that there were no code or configuration changes at the application level. The application primarily performs INSERT operations.

Impact :

- Approximately 80–90% of transactions were successful.

- Around 10% of transactions failed intermittently.

- Failures resulted in transaction declines.

- Although the impact was partial, this was a production system, making the issue critical.

Let's troubleshoot the issue...

Step 1: Check the Node Addition Activity logs for any error messages.

Let's check the Node addition activity for any error messages since this was the only change in the environment.

The runcluvfy for nodeadd was executed to ensure cluster configuration after node addition is correct.

cluvfy stage -post nodeadd -n testdbnode1,testdbnode2 -verbose

Both nodes were added correctly in the cluster. No issues observed.

Step 2: Now check the UNDO configuration across all nodes.

As per the error message, the application performs its DML operations on the application tablespace APPTBS_A1. However, during certain transactions, Oracle attempts to use the SYSTEM rollback segment for undo generation.

Oracle does not allow rollback of DML operations on non-SYSTEM tablespaces using the SYSTEM rollback segment, and this is expected behavior, which results in the error "ORA-01552: cannot use system rollback segment for non-system tablespace".

This indicates that Oracle was unable to allocate undo from the configured UNDO tablespace for the instance handling the session. As a fallback, Oracle attempted to use the default SYSTEM rollback segment, which is disallowed for application tablespaces. Therefore, the issue is not with the application tablespace [APPTBS_A1], but with the UNDO availability or configuration at the instance level.

The primary focus of the investigation is:

- Why Oracle internally attempted to use the SYSTEM rollback segment if UNDO tablespace of RAC instances are available ?

- What is incorrect or unavailable with the existing UNDO tablespaces on the affected RAC instances ?

It is important to note that UNDO tablespaces are configured for all four RAC instances; however, despite this configuration, one or more instances were unable to use their assigned UNDO tablespace during runtime, leading to the intermittent failures.

Let us review the existing UNDO tablespace configuration in the database. The initial investigation focuses on verifying whether UNDO_MANAGEMENT is set to MANUAL.

SQL> select inst_id,name,value from gv$parameter where name like '%undo%' order by 1;

INST_ID NAME VALUE

------- ------------------- --------

1 undo_management AUTO

2 undo_management AUTO

3 undo_management AUTO

4 undo_management AUTO

1 undo_tablespace UNDOTBS1

2 undo_tablespace UNDOTBS2

3 undo_tablespace UNDOTBS3

4 undo_tablespace UNDOTBS4

1 undo_retention 1800

2 undo_retention 1800

3 undo_retention 1800

4 undo_retention 1800

In this case, the UNDO_MANAGEMENT parameter is set to AUTO, confirming that automatic undo management is enabled.

What should be the next course of action?

The initial question that comes to mind is: although UNDO_MANAGEMENT is set to AUTO on all RAC instances, why did this error occur?

Let's re-create the UNDO tablespaces for all RAC instances and drop the existing ones.

Step 3: Create new UNDO tablespace for all RAC instances and drop existing UNDO tablespace.

SQL> CREATE UNDO TABLESPACE UNDOTBS1_NEW DATAFILE '+DG_DATA' size 31g;

SQL> CREATE UNDO TABLESPACE UNDOTBS2_NEW DATAFILE '+DG_DATA' size 31g;

SQL> CREATE UNDO TABLESPACE UNDOTBS3_NEW DATAFILE '+DG_DATA' size 31g;

SQL> CREATE UNDO TABLESPACE UNDOTBS4_NEW DATAFILE '+DG_DATA' size 31g;

SQL> ALTER SYSTEM SET UNDO_TABLESPACE=UNDOTBS1_NEW sid='testdb1' scope=spfile;

SQL> ALTER SYSTEM SET UNDO_TABLESPACE=UNDOTBS2_NEW sid='testdb2' scope=spfile;

SQL> ALTER SYSTEM SET UNDO_TABLESPACE=UNDOTBS3_NEW sid='testdb3' scope=spfile;

SQL> ALTER SYSTEM SET UNDO_TABLESPACE=UNDOTBS4_NEW sid='testdb4' scope=spfile;

SQL> set lines 300 pages 3000

SQL> col name for a16

SQL> SELECT r.name,b.status FROM v$rollname r,v$rollstat s  WHERE r.usn = s.usn AND r.name IN ( SELECT segment_name FROM dba_segments WHERE tablespace_name = '&TBS' );

SQL> alter system kill session '&sid,&serial,@&inst_id' immediate;

SQL> DROP TABLESPACE UNDOTBS1 INCLUDING CONTENTS AND DATAFILES;

SQL> DROP TABLESPACE UNDOTBS2 INCLUDING CONTENTS AND DATAFILES;

SQL> DROP TABLESPACE UNDOTBS3 INCLUDING CONTENTS AND DATAFILES;

SQL> DROP TABLESPACE UNDOTBS4 INCLUDING CONTENTS AND DATAFILES;

SQL> SHUT IMMEDIATE;

SQL> STARTUP;

SQL>  select inst_id,name,value from gv$parameter where name like '%undo%' order by 1;

INST_ID NAME                VALUE
------- ------------------- --------
      1 undo_management     AUTO
      2 undo_management     AUTO
      3 undo_management     AUTO
      4 undo_management     AUTO
	  
      1 undo_tablespace     UNDOTBS1_NEW
      2 undo_tablespace     UNDOTBS2_NEW
      3 undo_tablespace     UNDOTBS3_NEW
      4 undo_tablespace     UNDOTBS4_NEW

      1 undo_retention      1800
      2 undo_retention      1800
      3 undo_retention      1800
      4 undo_retention      1800

Even after re-creating the UNDO tablespace, the application team continued to encounter the same issue. This raises the question: what should be the next course of action, and what could be the underlying cause?

Step 4: Check if this database is cloned database.

One expected scenario for this type of error is when the database is a cloned database. In such cases, any data modification outside the SYSTEM tablespace results in this error because only the SYSTEM rollback segment can be online in a cloned database.

In such cases, you may need to apply the following workaround to resolve the issue.

sql> shutdown immediate;

# set below parameters in pfile.

undo_management=manual

rollback_segments=(ROLLBACK_PRIV_01, ROLLBACK_PRIV_02)

sql> startup mount;

sql> create private rollback segment ROLLBACK_PRIV_01 tablespace undotbs1;

sql> create private rollback segment ROLLBACK_PRIV_02 tablespace undotbs1;

sql> alter database open;

Here, as checked with concern DBA team, they confirmed that this database was not a cloned database. Therefore, the above workaround is not applicable and will not resolve the issue.

This leads to the next question: what should be investigated next, and what could be the underlying cause of this behavior?

Step 5: Let's enable system level tracing at DB level to capture this ORA-01552 error message.

SQL> alter system set timed_statistics=true;

SQL> alter system set statistics_level=all;

SQL> alter system set max_dump_file_size=unlimited;

SQL> alter system set events '1552 trace name errorstack level 3';

Request the application team to reproduce the issue, and once the reproduction is complete, disable tracing.

SQL> alter system set events '1552 trace name errorstack off';

#Check all instances alert log file for any ORA_ messages.

[oracle@testdbnode1 trace]$ grep -i ORA-01552 alert_testdb1.log

[oracle@testdbnode1 trace]$

[oracle@testdbnode2 trace]$ grep -i ORA-01552 alert_testdb2.log

[oracle@testdbnode2 trace]$

[oracle@testdbnode3 trace]$ grep -i ORA-01552 alert_testdb3.log

[oracle@testdbnode3 trace]$

[oracle@testdbnode4 trace]$ grep -i ORA-01552 alert_testdb4.log

[oracle@testdbnode4 trace]$

It was observed that no ORA- error messages were reported in the database alert log files across all RAC instances; however, the same error continued to appear in the application logs.

In this scenario, enabling database tracing would not be effective. Since the ORA- error is not recorded in the database alert logs, no corresponding trace files are generated at the database level. As a result, tracing does not provide additional diagnostic information when the error is observed only at the application front end and not within the database logs.

Step 6: Let us check whether any space-related jobs are enabled in the database that could be causing this issue.

As observed, the Oracle default maintenance jobs are enabled in the database. Let us disable these jobs and monitor whether the issue reoccurs.

SQL> set lines 300 pages 3000

SQL> col CLIENT_NAME for a40

SQL> col status for a15

SQL> select distinct CLIENT_NAME ,status from dba_autotask_client;

CLIENT_NAME STATUS

------------------------------------ --------

sql tuning advisor ENABLED

auto space advisor ENABLED

auto optimizer stats collection ENABLED

#To disable default maintenance tasks

BEGIN

dbms_auto_task_admin.disable(

client_name => 'sql tuning advisor',

operation => NULL,

window_name => NULL);

END;

BEGIN

dbms_auto_task_admin.disable(

client_name => 'auto space advisor',

operation => NULL,

window_name => NULL);

END;

BEGIN

dbms_auto_task_admin.disable(

client_name => 'auto optimizer stats collection',

operation => NULL,

window_name => NULL);

END;

SQL> set lines 300 pages 3000

SQL> col CLIENT_NAME for a40

SQL> col status for a15

SQL> select distinct CLIENT_NAME ,status from dba_autotask_client;

CLIENT_NAME STATUS

--------------------------------- ---------

sql tuning advisor DISABLED

auto space advisor DISABLED

auto optimizer stats collection DISABLED

Even after disabling the default maintenance jobs, the application continued to encounter the same error.

At this stage:

- All standard troubleshooting steps have been performed.

- The application team and customer are under significant stress.

- The issue is affecting production, and the customer is understandably unhappy.

Observations:

- The error persists intermittently despite UNDO tablespace recreation, database restart, and disabling maintenance jobs.

- No ORA- messages are observed in the database alert logs.

- Tracing at the database level does not help because the error is visible only at the application front end.

What is the next course of action to troubleshoot and achieve a permanent resolution of this issue?

Step 7: Let's check the UNDO tablespace space pressure.

SQL> set lines 300 pages 3000

SQL> col file_name for a65

SQL> col TABLESPACE_NAME for a30

SQL> SELECT A.TABLESPACE_NAME,round(SUM(A.SUM_TOT)/1024/1024) "Total size MB",

round(SUM(A.SUM_BYTES)/1024/1024) "Total Free MB",

(SUM(A.SUM_BYTES)*100/SUM(A.SUM_TOT)) "%FREE",

100-(SUM(A.SUM_BYTES)*100/SUM(A.SUM_TOT)) "%USED",

round(SUM(A.LARGEST)/1024/1024) MAX_FREE,SUM(A.CHUNKS) CHUNKS_FREE

FROM (

SELECT TABLESPACE_NAME,0 SUM_TOT,SUM(BYTES) SUM_BYTES,

MAX(BYTES) LARGEST,COUNT(*) CHUNKS

FROM SYS.DBA_FREE_SPACE A

GROUP BY TABLESPACE_NAME

UNION

SELECT TABLESPACE_NAME,SUM(BYTES) SUM_TOT,0,0,0

FROM SYS.DBA_DATA_FILES

GROUP BY TABLESPACE_NAME) A, V$INSTANCE B

GROUP BY UPPER(B.INSTANCE_NAME),A.TABLESPACE_NAME having (100-(SUM(A.SUM_BYTES)*100/SUM(A.SUM_TOT))) > 0

TABLESPACE_NAME Total size MB Total Free MB %FREE %USED MAX_FREE CHUNKS_FREE

--------------- ------------- ------------- ---------- ---------- -------- -----------

SYSTEM 11160 9922 88.90625 11.09375 3968 6

USERS 2053 2044 99.5433512 .456648807 2043 3

SYSAUX 43180 26484 61.3342404 38.6657596 3968 10

UNDOTBS1_NEW 383170 380971 99.4261196 .573880393 3968 864

UNDOTBS2_NEW 384550 382471 99.4593356 .540664413 3968 865

UNDOTBS3_NEW 319600 318050 99.5149797 .485020338 3968 274

UNDOTBS4_NEW 447860 445785 99.5366018 .46339816 3968 351

You can use the above script to monitor continuous free space in the UNDO tablespace. This confirms that, in this case, the UNDO tablespace was not under pressure during the issue window. The script can be modified as per the requirement.

What should be the next course of action?

Step 8: Let's check the rollback segment count for all RAC DB instances.

SQL> select status,count(*),TABLESPACE_NAME from dba_rollback_segs group by STATUS,TABLESPACE_NAME order by 1;

STATUS COUNT(*) TABLESPACE_NAME

---------- ---------- ----------------

ONLINE 1 SYSTEM

ONLINE 375 UNDOTBS1_NEW

ONLINE 360 UNDOTBS2_NEW

OFFLINE 11727 UNDOTBS1_NEW

OFFLINE 11000 UNDOTBS2_NEW

OFFLINE 9297 UNDOTBS3_NEW

Here, the total count (1+375+360+11727+11000+9297) is = 32760

Please note that there is the hard limit of the total undo segments(ONLINE+OFFLINE) across all undo tablespaces in the entire database. This is also the hard limit of all the Instances of RAC database. The limit is 32760 and is defined by the underscore parameter "_rollback_segment_count". This is instance specific parameter. The "_rollback_segment_count" parameter controls the number of undo segments that remain online over time, helping to balance workload across multiple segments.

Here, we can observe that the number of offline rollback segments is higher on Node1, Node2, and Node3. This caused the "_rollback_segment_count" parameter to reach its hard limit, which may have led Oracle to use the SYSTEM tablespace for rollback operations.

If this is the case, what would be the appropriate solution to resolve the issue?

Step 9: Let's apply a workaround to fix the issue. Please perform the following steps during non-business hours, when system load is minimal, or during an approved downtime window.

#Check the total number of rollback segment count for all instances.

SQL> select status,count(*),TABLESPACE_NAME from dba_rollback_segs group by STATUS,TABLESPACE_NAME order by 1;

STATUS COUNT(*) TABLESPACE_NAME

---------- ---------- ----------------

ONLINE 1 SYSTEM

ONLINE 375 UNDOTBS1_NEW

ONLINE 360 UNDOTBS2_NEW

OFFLINE 11727 UNDOTBS1_NEW

OFFLINE 11000 UNDOTBS2_NEW

OFFLINE 9297 UNDOTBS3_NEW

#Enable SMU Debug mode.

SQL> alter system set "_smu_debug_mode"=44;

#Execute below dynamic SQL to list too many offline rollback segments.

SQL> select 'drop rollback segment ' ||segment_name|| ';' from dba_rollback_segs where status = 'OFFLINE' and tablespace_name in('UNDOTBS1_NEW','UNDOTBS2_NEW','UNDOTBS3_NEW','UNDOTBS4_NEW');

#Execute above dynamic command and drop the offline segments.

#Check the rollback segment count again.

SQL> select status,count(*),TABLESPACE_NAME from dba_rollback_segs group by STATUS,TABLESPACE_NAME order by 1;

STATUS       COUNT(*) TABLESPACE_NAME

---------- ---------- ---------------

ONLINE              1 SYSTEM

ONLINE            868 UNDOTBS1_NEW

ONLINE            191 UNDOTBS2_NEW

ONLINE            278 UNDOTBS3_NEW

ONLINE            283 UNDOTBS4_NEW

OFFLINE            23 UNDOTBS3_NEW

OFFLINE             7 UNDOTBS4_NEW

#Disable SMU debug mode in memory.

SQL> alter system set "_smu_debug_mode"=0; 

#Disable SMU debug mode in spfile. Needs a DB restart.

SQL> alter system reset "_smu_debug_mode" scope=spfile;

SQL> shut immediate;

SQL> startup;

Step 10: Now ask the Application team to monitor and report if ORA-01552 occurs again.

✅ Issue Resolution Confirmation:

After applying the above workaround, the database was closely monitored for over one week, and no reoccurrence of the issue was observed.

The Application team also confirmed that there were no further transaction declines, and they have not encountered the ORA-01552 error again.

📌 Final Conclusion:

This clearly indicates that the issue was successfully resolved by dropping the excessive offline rollback segments from DB, thereby stabilizing undo management and preventing Oracle from using the SYSTEM tablespace for rollback operations.

🎉 Enjoy the troubleshooting journey!!!

📝 Stay tuned for a detailed blog post on this case !!!

Thanks for reading this post ! Please comment if you like this post ! Click FOLLOW to get future blog updates !

ORA-01552: cannot use system rollback segment for non-system tablespace

Let's troubleshoot the issue...

Post a Comment

Oracle Enterprise Manager 24ai Installation on Linux 8

Oracle Enterprise Manager 24ai Installation on Linux 8

Oracle DBA ++

Followers

Search This Blog

About Me

Contact form

ORA-01552: cannot use system rollback segment for non-system tablespace

Let's troubleshoot the issue...

You may like these posts

Post a Comment

Contact form