Environment: Current Database: Oracle 3-Node RAC 19.27 Operating System : Windows Server Issue Description: The Oracle database instance failed to start with the following error messages. The entire cluster was manually shut down to troubleshoot frequent network issues. Although the cluster stack started successfully, the RDBMS instance failed to start and encountered the errors listed below. CRS-2672: Attempting to start 'ora.testdb.db' on 'testdb3' CRS-5017: The resource action "ora.testdb.db start" encountered the following error: ORA-03113: end-of-file on communication channel Process ID: 0 Session ID: 0 Serial number: 0 . For details refer to "(:CLSN00107:)" in "D:\oracle\app\grid\diag\crs\testdb3\crs\trace\crsd_oraagent_system.trc". CRS-2674: Start of 'ora.testdb.db' on 'testdb3' failed CRS-2672: Attempting to start 'ora.ons' on 'testdb2' CRS-2674: Start of 'ora.ons' on 'testdb2' failed CRS-2672: Attempting to start 'ora.testdb.db' on 'testdb3' Changes in environment: The Network Team provided a different Ethernet card for the private interconnect IP on Node3. This was the only change made on Node3 to troubleshoot the private interconnect fluctuations. Troubleshooting Steps: Step 1: The private interconnect configuration was verified across all cluster nodes. This included checking Ethernet card names, IP addresses, subnet masks, gateways, and DNS IPs. No issues were observed during the verification. Step 2: Started the database manually and checked the issue. C:\Windows\system32> sqlplus / as sysdba SQL> startup ORA-03113: end-of-file on communication channel Here, after starting DB manually also, issue re-occurred. Step 3: A server reboot was initiated to investigate the issue. However, the command was aborted using <Ctrl+C> at the end, as it was continuously attempting to start the database instance. C:\Windows\system32>crsctl start crs -wait CRS-4123: Starting Oracle High Availability Services-managed resources CRS-2672: Attempting to start 'ora.mdnsd' on 'testdb03' CRS-2672: Attempting to start 'ora.evmd' on 'testdb03' CRS-2676: Start of 'ora.mdnsd' on 'testdb03' succeeded CRS-2676: Start of 'ora.evmd' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.gpnpd' on 'testdb03' CRS-2676: Start of 'ora.gpnpd' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.gipcd' on 'testdb03' CRS-2676: Start of 'ora.gipcd' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.crf' on 'testdb03' CRS-2672: Attempting to start 'ora.cssdmonitor' on 'testdb03' CRS-2676: Start of 'ora.crf' on 'testdb03' succeeded CRS-2676: Start of 'ora.cssdmonitor' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'testdb03' CRS-2676: Start of 'ora.cssd' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.ctssd' on 'testdb03' CRS-2676: Start of 'ora.ctssd' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.asm' on 'testdb03' CRS-2676: Start of 'ora.asm' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.storage' on 'testdb03' CRS-2676: Start of 'ora.storage' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.crsd' on 'testdb03' CRS-2676: Start of 'ora.crsd' on 'testdb03' succeeded CRS-6017: Processing resource auto-start for servers: testdb03 CRS-2673: Attempting to stop 'ora.testdb03.vip' on 'testdb02' CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'testdb01' CRS-2672: Attempting to start 'ora.ons' on 'testdb03' CRS-2677: Stop of 'ora.testdb03.vip' on 'testdb02' succeeded CRS-2672: Attempting to start 'ora.testdb03.vip' on 'testdb03' CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'testdb01' succeeded CRS-2673: Attempting to stop 'ora.scan1.vip' on 'testdb01' CRS-2677: Stop of 'ora.scan1.vip' on 'testdb01' succeeded CRS-2672: Attempting to start 'ora.scan1.vip' on 'testdb03' CRS-5052: invalid host name or IP address '10.189.197.25' CRS-2676: Start of 'ora.testdb03.vip' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.LISTENER.lsnr' on 'testdb03' CRS-2676: Start of 'ora.scan1.vip' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'testdb03' CRS-2676: Start of 'ora.LISTENER.lsnr' on 'testdb03' succeeded CRS-33672: Attempting to start resource group 'ora.asmgroup' on server 'testdb03' CRS-2672: Attempting to start 'ora.asmnet1.asmnetwork' on 'testdb03' CRS-2676: Start of 'ora.asmnet1.asmnetwork' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.ASMNET1LSNR_ASM.lsnr' on 'testdb03' CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'testdb03' succeeded CRS-2676: Start of 'ora.ASMNET1LSNR_ASM.lsnr' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.asm' on 'testdb03' CRS-2676: Start of 'ora.ons' on 'testdb03' succeeded CRS-2676: Start of 'ora.asm' on 'testdb03' succeeded CRS-33676: Start of resource group 'ora.asmgroup' on server 'testdb03' succeeded. CRS-2672: Attempting to start 'ora.DG_ARCH.dg' on 'testdb03' CRS-2672: Attempting to start 'ora.DG_DATA.dg' on 'testdb03' CRS-2672: Attempting to start 'ora.DG_REDO2.dg' on 'testdb03' CRS-2672: Attempting to start 'ora.DG_REDO1.dg' on 'testdb03' CRS-2672: Attempting to start 'ora.ons' on 'testdb02' CRS-2676: Start of 'ora.DG_ARCH.dg' on 'testdb03' succeeded CRS-2676: Start of 'ora.DG_REDO2.dg' on 'testdb03' succeeded CRS-2676: Start of 'ora.DG_DATA.dg' on 'testdb03' succeeded CRS-2676: Start of 'ora.DG_REDO1.dg' on 'testdb03' succeeded CRS-2674: Start of 'ora.ons' on 'testdb02' failed CRS-2679: Attempting to clean 'ora.testdb.db' on 'testdb03' CRS-2681: Clean of 'ora.testdb.db' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.testdb.db' on 'testdb03' CRS-5017: The resource action "ora.testdb.db start" encountered the following error: ORA-03113: end-of-file on communication channel Process ID: 0 Session ID: 0 Serial number: 0 . For details refer to "(:CLSN00107:)" in "D:\oracle\app\grid\diag\crs\testdb03\crs\trace\crsd_oraagent_system.trc". CRS-2674: Start of 'ora.testdb.db' on 'testdb03' failed CRS-2672: Attempting to start 'ora.ons' on 'testdb02' CRS-2674: Start of 'ora.ons' on 'testdb02' failed CRS-2672: Attempting to start 'ora.testdb.db' on 'testdb03' ^C C:\Windows\system32> The issue persisted even after performing a server reboot. DB Alert Log: 2025-08-14T00:18:59.264174+05:30 My CSS node number is 3 My CSS hostname is iprblcdb03 lmon registered with NM - instance number 3 (internal mem no 2) 2025-08-14T00:19:20.702698+05:30 Detected partial connectivity during CGS reconfiguration. No connectivity to instances: 1 2 As it is starting up, terminate the instance. LMON (ospid: 19464): terminating the instance due to ORA error 481 Cause - 'Instance is being terminated by LMON due to partial connectivity' 2025-08-14T00:19:20.766701+05:30 System state dump requested by (instance=3, osid=19464 (LMON)), summary=[abnormal instance termination]. error - 'Instance is terminating. ' System State dumped to trace file D:\ORACLE\APP\ORACLE\diag\rdbms\testdb\testdb3\trace\testdb3_diag_17636.trc 2025-08-14T00:19:26.025616+05:30 Instance terminated by LMON, pid = 19464 2025-08-14T00:41:39.128615+05:30 As observed in the database alert log, the instance was terminated by the LMON process. Let's review the LMON trace file for any related error messages. testdb3_lmon_25556.trc ================================ == System Network Information == ================================ ==[ Network Interface Query not supported ]== == [ Network Transport Usage (ksipc: avail[x28] sel[UDP]) (IPv4) ] == kjxgmConnCheckAllInst_CB: dump the CGS communication info. my inst 3 (node 2) inc (376 4) send-window 500 flags 0x8c100002 (b.) ats-threshold 250 reg msgs: sz 504 blks 1 alloc 100 free 100 big msgs: sz 4112 blks 1 alloc 100 free 100 rcv msgs: sz 4112 blks 8 alloc 2000 free 0 total rcv msgs 11 size 1920 avgsz 174 (in bytes) total rcv bmsgs 0 embedded msgs 0 avg 0 total prc msgs 17 pending sends 0 conn[inst 1] fg 0x0 snds (14:0) qmsgs (0:0) bths (0:0) unas 8 unar 2:0 nreq 0 nack (0:0) conn[inst 2] fg 0x0 snds (8:0) qmsgs (0:0) bths (0:0) unas 6 unar 1:0 nreq 0 nack (0:0) conn[inst 3] fg 0x0 snds (6:0) qmsgs (0:0) bths (0:0) unas 0 unar 0:0 nreq 0 nack (0:0) ============================== Instance Interface Information ============================== num of pending ping requests = 0 inst 1 nifs 0 overall state (0x20:succ) inst 2 nifs 0 overall state (0x20:succ) inst 3 nifs 1 (our node) real IP 10.191.137.133 Detected partial connectivity during CGS reconfig No connectivity to instances: 1 2 Terminating the instance 2025-08-13 23:25:20.028 :kjzduptcctx(): Notifying DIAG for crash event ----- Abridged Call Stack Trace ----- ksedsts()+485<-kjzdssdmp()+270<-kjzduptcctx()+944<-kjzdicrshnfy()+1462<-ksuitm_opt()+2163<-kjxgmConnCheckAllInst_CB()+584<-kjxgm_ConnectivityReq_InvokeCB()+554<-kjxgm_ConnectivityReq_Process()+221<-kjxgmpoll()+1472<-kjxggpoll()+488<-kjfmact()+360<-kjfcln()+11511 <-ksbrdp()+1548<-opirip()+631<-opidrv()+851<-sou2o()+110<-opimai_real()+283<-opimai()+268<-BackgroundThreadStart()+580<-0x00007FFEEB007AC4 ----- End of Abridged Call Stack Trace ----- Partial short call stack signature: 0x577baf1a2d9e368f *** 2025-08-13T23:25:20.048779+05:30 LMON (ospid: 25556): terminating the instance due to ORA error 481 Cause - 'Instance is being terminated by LMON due to partial connectivity' Upon reviewing the LMON trace file, it was observed that the Node3 RDBMS instance was unable to establish communication with Node1 and Node2. This lack of connectivity resulted in a partial interconnect failure, ultimately causing the Node3 database instance to fail during startup. The LMON process on Node3 detected a communication breakdown with the other cluster nodes. This behavior pointed to a partial connectivity issue within the private interconnect network. The newly configured Ethernet card assigned to the private interconnect on Node3 was reviewed for any misconfigurations or hardware issues. Ping tests were executed for all private interconnect IPs across all nodes to check end-to-end connectivity and packet loss, if any. Normal Ping: C:\Windows\system32>ping Private_IP_Node1 C:\Windows\system32>ping Private_IP_Node2 C:\Windows\system32>ping Private_IP_Node3 MTU Ping: C:\Windows\system32>ping -f -l 9000 Private_IP_Node1 C:\Windows\system32>ping -f -l 9000 Private_IP_Node2 C:\Windows\system32>ping -f -l 9000 Private_IP_Node3 C:\Windows\system32>ping -f -l 9000 Private_IP_Node3 Pinging <Private_IP_Node3> with 9000 bytes of data: Packet needs to be fragmented but DF set. Packet needs to be fragmented but DF set. In this case, normal ping tests were successful on Node3. However, ping with specified MTU size failed only on Node3. As a result, the MTU setting for the private interconnect Ethernet card on Node3 was cross-checked for potential misconfiguration. #Command to check MTU for all ethernet cards on Windows Server. C:\Windows\system32>netsh interface ipv4 show subinterface MTU MediaSenseState Bytes In Bytes Out Interface ----------- --------------- --------- --------- ------------- 4294967295 1 0 49704 Loopback Pseudo-Interface 1 9000 1 170675828 217183734 pubnet 9000 1 1034136 259307 Ethernet2 1500 1 448185024 269497005 privnet The issue was related to an MTU mismatch. The newly configured private Ethernet card on Node3 had the MTU set to 1500, whereas the expected value across all nodes was 9000. While Node1 and Node2 were correctly configured with an MTU of 9000, Node3 remained at 1500, leading to a mismatch and causing private interconnect communication issues. Steps to resolve the MTU mismatch Issue: Stop the cluster again on Node3 and change the MTU value to 9000 for privnet ethernet card. #Stop cluster on Node3. C:\Windows\system32>crsctl stop crs -f #Set the MTU to 9000 for private interconnect card. CO-ordinate with System Admin team for changing MTU value. C:\Windows\system32>netsh interface ipv4 show subinterface MTU MediaSenseState Bytes In Bytes Out Interface ------ --------------- --------- --------- ------------- 4294967295 1 0 81428 Loopback Pseudo-Interface 1 9000 1 177182139 226258472 pubnet 9000 1 8233 14568 privnet 9000 1 1091504 300327 Ethernet2 1500 1 458750868 275851852 privnet_new #Start the cluster C:\Windows\system32>crsctl start crs -wait CRS-4123: Starting Oracle High Availability Services-managed resources CRS-2672: Attempting to start 'ora.mdnsd' on 'testdb03' CRS-2672: Attempting to start 'ora.evmd' on 'testdb03' CRS-2676: Start of 'ora.mdnsd' on 'testdb03' succeeded CRS-2676: Start of 'ora.evmd' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.gpnpd' on 'testdb03' CRS-2676: Start of 'ora.gpnpd' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.gipcd' on 'testdb03' CRS-2676: Start of 'ora.gipcd' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.crf' on 'testdb03' CRS-2672: Attempting to start 'ora.cssdmonitor' on 'testdb03' CRS-2676: Start of 'ora.crf' on 'testdb03' succeeded CRS-2676: Start of 'ora.cssdmonitor' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'testdb03' CRS-2676: Start of 'ora.cssd' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.ctssd' on 'testdb03' CRS-2676: Start of 'ora.ctssd' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.asm' on 'testdb03' CRS-2676: Start of 'ora.asm' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.storage' on 'testdb03' CRS-2676: Start of 'ora.storage' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.crsd' on 'testdb03' CRS-2676: Start of 'ora.crsd' on 'testdb03' succeeded CRS-6017: Processing resource auto-start for servers: testdb03 CRS-2673: Attempting to stop 'ora.testdb03.vip' on 'testdb02' CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'testdb01' CRS-2672: Attempting to start 'ora.ons' on 'testdb03' CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'testdb01' succeeded CRS-2673: Attempting to stop 'ora.scan1.vip' on 'testdb01' CRS-2677: Stop of 'ora.testdb03.vip' on 'testdb02' succeeded CRS-2672: Attempting to start 'ora.testdb03.vip' on 'testdb03' CRS-2677: Stop of 'ora.scan1.vip' on 'testdb01' succeeded CRS-2672: Attempting to start 'ora.scan1.vip' on 'testdb03' CRS-2676: Start of 'ora.testdb03.vip' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.LISTENER.lsnr' on 'testdb03' CRS-2676: Start of 'ora.scan1.vip' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.LISTENER_SCAN1.lsnr' on 'testdb03' CRS-2676: Start of 'ora.LISTENER.lsnr' on 'testdb03' succeeded CRS-33672: Attempting to start resource group 'ora.asmgroup' on server 'testdb03' CRS-2672: Attempting to start 'ora.asmnet1.asmnetwork' on 'testdb03' CRS-2676: Start of 'ora.asmnet1.asmnetwork' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.ASMNET1LSNR_ASM.lsnr' on 'testdb03' CRS-2676: Start of 'ora.LISTENER_SCAN1.lsnr' on 'testdb03' succeeded CRS-2676: Start of 'ora.ASMNET1LSNR_ASM.lsnr' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.asm' on 'testdb03' CRS-2676: Start of 'ora.ons' on 'testdb03' succeeded CRS-2676: Start of 'ora.asm' on 'testdb03' succeeded CRS-33676: Start of resource group 'ora.asmgroup' on server 'testdb03' succeeded. CRS-2672: Attempting to start 'ora.DG_DATA.dg' on 'testdb03' CRS-2676: Start of 'ora.DG_DATA.dg' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.DG_ARCH.dg' on 'testdb03' CRS-2672: Attempting to start 'ora.DG_REDO1.dg' on 'testdb03' CRS-2672: Attempting to start 'ora.DG_REDO2.dg' on 'testdb03' CRS-2672: Attempting to start 'ora.ons' on 'testdb02' CRS-2676: Start of 'ora.DG_ARCH.dg' on 'testdb03' succeeded CRS-2676: Start of 'ora.DG_REDO1.dg' on 'testdb03' succeeded CRS-2676: Start of 'ora.DG_REDO2.dg' on 'testdb03' succeeded CRS-2674: Start of 'ora.ons' on 'testdb02' failed CRS-2679: Attempting to clean 'ora.testdb.db' on 'testdb03' CRS-2681: Clean of 'ora.testdb.db' on 'testdb03' succeeded CRS-2672: Attempting to start 'ora.testdb.db' on 'testdb03' CRS-2676: Start of 'ora.testdb.db' on 'testdb03' succeeded CRS-6016: Resource auto-start has completed for server testdb03 CRS-6024: Completed start of Oracle Cluster Ready Services-managed resources CRS-4123: Oracle High Availability Services has been started. C:\Windows\system32> After configuring the correct MTU value on the newly configured Ethernet card, the issue was resolved. The RDBMS instance on Node3 started successfully without any further problems. ✅ The database alert log was clean and showed no error messages. ✅ The issue was successfully resolved by setting the correct MTU value and restarting cluster. ✅ Database was started successfully without any futher errors. 🎉 Enjoy the troubleshooting journey!!! 📝 Stay tuned for a detailed blog post on this case !!! |
Thanks for reading this post ! Please comment if you like this post ! Click FOLLOW to get future blog updates !
Thank you for visiting my blog ! Thanks for your comment !