Issue Description:
Network outage at Datacenter due to which all DB servers were suddenly become inaccessible. Once the network issue was resolved and tried to start Oracle RAC cluster of the critical production DB server, CRS was unable to start. So tried to start the CRS from both nodes, but still issue was not resolved.
Upon checking cluster alert log file, It was starting HASD and other services, but it was getting stuck while starting up ASM instance with below errors.
Cluster Alert Log File Snippet:
2023-02-17 11:01:48.882 [CLSECHO(346751)]ACFS-9464: Symbol mpage_readpages=0x217bdd9e: ERROR, SYMBOL NOT FOUND.
2023-02-17 11:01:49.377 [ORAROOTAGENT(343427)]CRS-5016: Process "/oracle_19c_grid/app/oracle/product/19.3/grid_home/bin/acfsload" spawned by agent "ORAROOTAGENT" for action "start" failed: details at "(:CLSN00010:)" in "/oracle_19c_grid/app/grid/diag/crs/rac2/crs/trace/ohasd_orarootagent_root.trc"
2023-02-17 11:01:52.196 [ORAROOTAGENT(343427)]CRS-5016: Process "/oracle_19c_grid/app/oracle/product/19.3/grid_home/bin/acfsload" spawned by agent "ORAROOTAGENT" for action "check" failed: details at "(:CLSN00010:)" in "/oracle_19c_grid/app/grid/diag/crs/rac2/crs/trace/ohasd_orarootagent_root.trc"
2023-02-17 11:02:00.128 [OLOGGERD(347058)]CRS-8500: Oracle Clusterware OLOGGERD process is starting with operating system process ID 347058
2023-02-17 11:02:09.353 [ORAROOTAGENT(343427)]CRS-5019: All OCR locations are on ASM disk groups [OCR_VOTE], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/oracle_19c_grid/app/grid/diag/crs/rac2/crs/trace/ohasd_orarootagent_root.trc".
.....
2023-02-17 11:03:34.317 [ORAAGENT(343549)]CRS-5011: Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/oracle_19c_grid/app/grid/diag/crs/rac2/crs/trace/ohasd_oraagent_grid.trc"
2023-02-17 11:03:34.319 [ORAAGENT(343549)]CRS-5011: Check of resource "ora.asm" failed: details at "(:CLSN00006:)" in "/oracle_19c_grid/app/grid/diag/crs/rac2/crs/trace/ohasd_oraagent_grid.trc"
2023-02-17 11:03:46.083 [ORAROOTAGENT(343427)]CRS-5019: All OCR locations are on ASM disk groups [OCR_VOTE], and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/oracle_19c_grid/app/grid/diag/crs/rac2/crs/trace/ohasd_orarootagent_root.trc".
Environment:
- 2 Node RAC Cluster
- Oracle 19c RAC on Linux
Issue Resolution:
What do you think ? What can be the issue ?
The above voting disk group errors can be post facto since ASM instance is not up. All disks were physically located in correct directory with proper permissions.
I thought let's try to start the ASM instance manually. At this stage I got below error message while starting up ASM instance.
$grid
$sqlplus / as sysasm
SQL> startup;
ORA-32004: obsolete or deprecated parameter(s) specified for ASM instance
ORA-15021: parameter "long_module_action" is not valid in ASM instance
SQL>
Something new ? Did this parameter create problem ?
I checked this parameter in ASM alert log file, but I did not get it. I checked other environment also but I did not find such parameter in ASM parameter file.
What do you think ? What can be the issue ?
At this stage, I decided to create new pfile for ASM instance for which I captured few parameters from ASM alert log file.
$grid
$vi asm_pfile.ora
large_pool_size = 12M
remote_login_passwordfile= "EXCLUSIVE"
asm_diskstring = "/dev/oracleasm/disks/*"
asm_diskgroups = "FRA"
asm_diskgroups = "REDO"
asm_diskgroups = "DATA"
asm_power_limit = 1
$sqlplus / as sysasm
SQL> startup pfile='/home/grid/asm_pfile.ora';
After executing above command, ASM instance started successfully.
Got confused now. Issue was resolved, but what can be the RCA. What was the issue ?
Now I decided to create spfile from newly created pfile, but before creating spfile I thought I should take backup of existing spfile.
ASMCMD [+OCR_VOTE/TEST/ASMPARAMETERFILE] > ls -l
Type Redund Striped Time Sys Name
ASMPARAMETERFILE UNPROT COARSE FEB 17 11:00:00 Y REGISTRY.253.1117484719
Here is the problem. Below error messages were appeared on screen while copying existing ASM spfile to same or different location.
ASMCMD [+OCR_VOTE/TEST/ASMPARAMETERFILE] > cp REGISTRY.253.1117484719 /oracle_19c_grid/app/grid/diag/crs/rac2/crs/trace/REGISTRY.253.1117484719_bkp_17feb2023
copying +OCR_VOTE/TEST/ASMPARAMETERFILE/REGISTRY.253.1117484719 -> /oracle_19c_grid/app/grid/diag/crs/rac2/crs/trace/REGISTRY.253.1117484719_bkp_17feb2023
ASMCMD-8016: copy source '+OCR_VOTE/TEST/ASMPARAMETERFILE/REGISTRY.253.1117484719' and target '/oracle_19c_grid/app/grid/diag/crs/rac2/crs/trace/REGISTRY.253.1117484719_bkp_17feb2023' failed
ORA-19505: failed to identify file "+OCR_VOTE/TEST/ASMPARAMETERFILE/REGISTRY.253.1117484719"
ORA-17503: ksfdopn:2 Failed to open file +OCR_VOTE/TEST/ASMPARAMETERFILE/REGISTRY.253.1117484719
ORA-15308: could not access ASM SPFILE in RDBMS instance
ORA-06512: at "SYS.X$DBMS_DISKGROUP", line 617
ORA-06512: at line 3 (DBD ERROR: OCIStmtExecute)
ASMCMD [+OCR_VOTE/TEST/ASMPARAMETERFILE] >
The ASM spfile was corrupted and due to which ASM instance was not coming up.
The file was corrupted and hence it was giving error for such false parameter.
The corrupted spfile was removed from the location and new spfile was created.
ASMCMD [+OCR_VOTE/TEST/ASMPARAMETERFILE] > rm REGISTRY.253.1117484719
SQL> create spfile='+OCR_VOTE' from pfile='/home/grid/asm_pfile.ora';
SQL> shut immediate;
SQL> startup;
SQL> show parameter spfile;
ASMCMD [+OCR_VOTE/TEST/ASMPARAMETERFILE] > ls -l
Type Redund Striped Time Sys Name
ASMPARAMETERFILE UNPROT COARSE FEB 17 11:00:00 Y REGISTRY.253.1129031575
Cluster was restarted in rolling fashion one by one post fixing the above issue which started successfully without any issue.
Thanks for reading this post ! Please comment if you like this post !
Thank you for your comment !