Sunday, October 28, 2012

Test HA 3: Oracle RAC setup

Following the setup of an iSCSI shared disk available to 2 VMs, this time I am installing a test Oracle RAC
- Again without any specific hardware (everything runs on a simple PC)

 

Interconnect network: 

In RAC parlance it is the private net between the nodes (==heartbeat net)
On the virtual machine I add a 'Host-only network' interface.
In my case I took the 192.168.160.0/24 network, I changed the DHCP to static as follows: (using system-config-network for example)

RAC1: eth1 192.168.160.101 / 255.255.255.0
RAC2: eth1 192.168.160.101 / 255.255.255.0


#service network restart
RAC1 $ ping 192.168.160.102
OK

 

Users & system setup


Reference: Database Installation Guide for Linux

On each node:

# yum install gcc elfutils-libelf-devel glibc-devel libaio-devel libstdc++-devel unixODBC unixODBC-devel gcc-c++

# groupadd dba
# groupadd oinstall
# useradd -m oracle -g oinstall -G dba,asmdba
# passwd oracle

# cat >>  /etc/security/limits.conf
 oracle          hard    nofiles         65536
 oracle          soft    nofiles         65536

# cat >>  /etc/sysctl.conf
 kernel.sem = 250        32000   100      128
 fs.file-max = 6815744
 net.ipv4.ip_local_port_range = 9000    65500
 net.core.rmem_default = 262144
 net.core.rmem_max = 4194304
 net.core.wmem_default = 262144
 net.core.wmem_max = 4194304
 fs.aio-max-nr = 1048576

# /sbin/sysctl -p

# mkdir /opt/oracle
# chown oracle.oinstall /opt/oracle/

ssh connection without password

RAC1# su - oracle
RAC1$ ssh-keygen -b 2048
(type enter for an empty passphrase)

Repeat on RAC2 (this creates .ssh directory and private/pub keys)

RAC1$ scp .ssh/id_rsa.pub rac2:/home/oracle/.ssh/authorized_keys
RAC2$ chmod 600 /home/oracle/.ssh/authorized_keys
RAC2$ scp .ssh/id_rsa.pub rac1:/home/oracle/.ssh/authorized_keys
RAC1$ chmod 600 /home/oracle/.ssh/authorized_keys

Then ssh works without password from one node to the other
RAC1$ ssh RAC2

OUI (Oracle Installer) also needs login from itself, so we also need on each node our own public key:
$ cd ~/.ssh && cat id_rsa.pub >> authorized_keys

Choose some config names and IPs

in my case:

cluster name= raccluster
public hostname1 = rac1             192.168.0.201
public hostname2 = rac2             192.168.0.202
virtual hostname1 = rac1-vip       192.168.0.211
virtual hostname2 = rac2-vip       192.168.0.212

virtual IP:                 racvip          192.168.0.203
SCAN addresses:     rac-scan 192.168.0.213  192.168.0.214 192.168.0.215
(defined though the DNS, see my DNS post if like me you forgot...)

vi /etc/nsswitch.conf
hosts: dns files

# service nscd restart

Created some directories:

# mkdir -p /u01/app/11.2.0/grid
# chown -R oracle:oinstall /u01/app/11.2.0/grid

# mkdir -p /u01/app/oracle/
# chown -R oracle:oinstall /u01/app/oracle/
# chmod -R 775 /u01/app/oracle/


NTPD:


ntpd is needed and need special slewing option:

# vi /etc/sysconfig/ntpd
     OPTIONS="-x -u ntp:ntp -p /var/run/ntpd.pid"

# service ntpd restart
# chkconfig ntpd on

 

Setup ASMLib

In this example we use ASM (alternatives are: ocfs2, GFS..)
We insall ASMlib which is just the lower level software (kernel driver and low level utils). The rest of ASM is installed through the 'grid'

rac1 & rac2:
wget http://oss.oracle.com/projects/oracleasm/dist/files/RPMS/rhel5/x86/2.0.5/2.6.18-238.el5/oracleasm-2.6.18-238.el5-2.0.5-1.el5.i686.rpm
wget http://oss.oracle.com/projects/oracleasm-support/dist/files/RPMS/rhel5/x86/2.1.7/oracleasm-support-2.1.7-1.el5.i386.rpm
wget http://download.oracle.com/otn_software/asmlib/oracleasmlib-2.0.4-1.el5.i386.rpm
rpm -i oracleasm-support-2.1.7-1.el5.i386.rpm oracleasmlib-2.0.4-1.el5.i386.rpm oracleasm-2.6.18-238.el5-2.0.5-1.el5.i686.rpm

ASMlib configuration: (note the documentation is missing the '-i' option)

# /usr/sbin/oracleasm configure -i
Default user to own the driver interface []: oracle
Default group to own the driver interface []: dba
Start Oracle ASM library driver on boot (y/n) [n]: y
Scan for Oracle ASM disks on boot (y/n) [y]: y
Writing Oracle ASM library driver configuration: done


rac1# /usr/sbin/oracleasm createdisk ASMDISK1 /dev/sdd1

rac1# /usr/sbin/oracleasm listdisks
ASMDISK1

rac2# /usr/sbin/oracleasm scandisks
rac2# /usr/sbin/oracleasm listdisks
ASMDISK1

I can the ASM disk on both nodes. Good !


Grid Installation

The grid software contains ASM and Oracle Clusterware.
In this test setup I used the same 'oracle'  user (with hindsight I should have used 'grid', much cleaner to separate the grid/clusterware from DB itself)

export ORACLE_BASE=/u01/app/oracle/
export ORACLE_HOME=/u01/app/11.2.0/grid
./runInstaller

I met this error: [INS-40910] Virtual IP: entered is invalid.
misleading in my case it was due to bad reverse DNS resolution...

run the root script, which started a bunch of stuff and used the ASM disk

 ohasd is starting
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'
CRS-2672: Attempting to start 'ora.crsd' on 'rac1'
CRS-2672: Attempting to start 'ora.evmd' on 'rac1'
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2672: Attempting to start 'ora.DATA.dg' on 'rac1'
CRS-2672: Attempting to start 'ora.registry.acfs' on 'rac1'


Verifications:

 $ ./crsctl check cluster -all
**************************************************************
rac1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
rac2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************


Oracle processes after clusterware + ASM (grid) install

Oh man, Oracle it works but it is not really lightweight... We haven't installed any real DB yet !

root      8521     1  0 07:47 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
root      8544     1  0 07:47 ?        00:00:04 /u01/app/11.2.0/grid/bin/ohasd.bin reboot
root      9842     1  0 07:48 ?        00:00:01 /u01/app/11.2.0/grid/bin/orarootagent.bin
oracle   10624     1  0 07:51 ?        00:00:03 /u01/app/11.2.0/grid/bin/oraagent.bin
oracle   10639     1  0 07:51 ?        00:00:00 /u01/app/11.2.0/grid/bin/mdnsd.bin
oracle   10651     1  0 07:51 ?        00:00:00 /u01/app/11.2.0/grid/bin/gipcd.bin
oracle   10662     1  0 07:51 ?        00:00:01 /u01/app/11.2.0/grid/bin/gpnpd.bin
root     10677     1  0 07:51 ?        00:00:01 /u01/app/11.2.0/grid/bin/cssdmonitor
root     10694     1  0 07:51 ?        00:00:01 /u01/app/11.2.0/grid/bin/cssdagent
oracle   10696     1  0 07:51 ?        00:00:00 /u01/app/11.2.0/grid/bin/diskmon.bin -d -f
oracle   10715     1  0 07:51 ?        00:00:03 /u01/app/11.2.0/grid/bin/ocssd.bin
root     10792     1  0 07:52 ?        00:00:00 /u01/app/11.2.0/grid/bin/octssd.bin
oracle   10852     1  0 07:52 ?        00:00:00 asm_pmon_+ASM1
oracle   10854     1  0 07:52 ?        00:00:00 asm_vktm_+ASM1
oracle   10858     1  0 07:52 ?        00:00:00 asm_gen0_+ASM1
oracle   10860     1  0 07:52 ?        00:00:00 asm_diag_+ASM1
oracle   10862     1  0 07:52 ?        00:00:00 asm_ping_+ASM1
oracle   10864     1  0 07:52 ?        00:00:00 asm_psp0_+ASM1
oracle   10866     1  0 07:52 ?        00:00:00 asm_dia0_+ASM1
oracle   10868     1  0 07:52 ?        00:00:00 asm_lmon_+ASM1
oracle   10870     1  0 07:52 ?        00:00:00 asm_lmd0_+ASM1
oracle   10873     1  0 07:52 ?        00:00:00 asm_lms0_+ASM1
oracle   10877     1  0 07:52 ?        00:00:00 asm_lmhb_+ASM1
oracle   10879     1  0 07:52 ?        00:00:00 asm_mman_+ASM1
oracle   10881     1  0 07:52 ?        00:00:00 asm_dbw0_+ASM1
oracle   10883     1  0 07:52 ?        00:00:00 asm_lgwr_+ASM1
oracle   10885     1  0 07:52 ?        00:00:00 asm_ckpt_+ASM1
oracle   10887     1  0 07:52 ?        00:00:00 asm_smon_+ASM1
oracle   10889     1  0 07:52 ?        00:00:00 asm_rbal_+ASM1
oracle   10891     1  0 07:52 ?        00:00:00 asm_gmon_+ASM1
oracle   10893     1  0 07:52 ?        00:00:00 asm_mmon_+ASM1
oracle   10895     1  0 07:52 ?        00:00:00 asm_mmnl_+ASM1
oracle   10897     1  0 07:52 ?        00:00:00 /u01/app/11.2.0/grid/bin/oclskd.bin
oracle   10900     1  0 07:52 ?        00:00:00 asm_lck0_+ASM1
root     10912     1  0 07:52 ?        00:00:08 /u01/app/11.2.0/grid/bin/crsd.bin reboot
oracle   10928     1  0 07:52 ?        00:00:01 /u01/app/11.2.0/grid/bin/evmd.bin
oracle   10930     1  0 07:52 ?        00:00:00 asm_asmb_+ASM1
oracle   10932     1  0 07:52 ?        00:00:00 oracle+ASM1_asmb_+asm1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
root     10958     1  0 07:52 ?        00:00:00 /u01/app/11.2.0/grid/bin/oclskd.bin
oracle   10960     1  0 07:52 ?        00:00:01 oracle+ASM1_ocr (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle   11017 10928  0 07:52 ?        00:00:00 /u01/app/11.2.0/grid/bin/evmlogger.bin -o /u01/app/11.2.0/grid/evm/log/evmlogger.info -l /u01/app/11.2.0/grid/evm/log/ev
mlogger.log
oracle   11220     1  0 07:53 ?        00:00:02 /u01/app/11.2.0/grid/bin/oraagent.bin
root     11388     1  0 07:53 ?        00:00:11 /u01/app/11.2.0/grid/bin/orarootagent.bin
oracle   11415     1  0 07:53 ?        00:00:00 /u01/app/11.2.0/grid/opmn/bin/ons -d
oracle   11416 11415  0 07:53 ?        00:00:00 /u01/app/11.2.0/grid/opmn/bin/ons -d
oracle   11467     1  0 07:53 ?        00:00:03 /u01/app/11.2.0/grid/jdk/jre//bin/java -Doracle.supercluster.cluster.server=eonsd -Djava.net.preferIPv4Stack=true -Djava
.util.logging.config.file=/u01/app/11.2.0/grid/srvm/admin/logging.properties -classpath /u01/app/11.2.0/grid/jdk/jre//lib/rt.jar:/u01/app/11.2.0/grid/jlib/srvm.jar:/u01
/app/11.2.0/grid/jlib/srvmhas.jar:/u01/app/11.2.0/grid/jlib/supercluster.jar:/u01/app/11.2.0/grid/jlib/supercluster-common.jar:/u01/app/11.2.0/grid/ons/lib/ons.jar orac
le.supercluster.impl.cluster.EONSServerImpl
oracle   11609     1  0 07:53 ?        00:00:00 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN2 -inherit
oracle   11620     1  0 07:54 ?        00:00:00 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN3 -inherit
oracle   12474     1  0 08:05 ?        00:00:00 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER -inherit

 

ASM peek:

With: export ORACLE_SID="+ASM1" and PATH to OH/bin

$ asmcmd
ASMCMD> ls
DATA/
ASMCMD> du
Used_MB      Mirror_used_MB
    263                 263
...

Database install

Again with same 'oracle' user, but at a different 'home'

$ export ORACLE_BASE=/u01/app/oracle/
$ export ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1

$ ./runInstaller

-Create & configure
-Server Class
-RAC type
-Typical
-Storage type ASM, and location on DATA

-Global name: DTST


went Ok except
problem1)
  ora-845 memory_target not supported (actually not enough shm!)
added to the fstab:
shmfs             /dev/shm         tmpfs   size=1200m     0

and did it manually:
# mount -t tmpfs shmfs -o size=1200m /dev/shm

problem2)
 strange error
CRS-5804: Communication error with agent process
CRS-2632: There are no more servers to try to place resource 'ora.dtst.db' on that would satisfy its placement policy

RAC2$ ./srvctl status database -d DTST
Instance DTST1 is running on node rac1
Instance DTST2 is not running on node rac2

Tried to restart, expecting to see the error...
$ ./srvctl stop database -d DTST
$ ./srvctl start database -d DTST

$ ./srvctl status database -d DTST
Instance DTST1 is running on node rac1
Instance DTST2 is running on node rac2

but it went OK this time, should have investigated this, but skipped for now...


Verifications

Documentation suggests this:

$ cd /u01/app/11.2.0/grid/bin
$ ./crsctl status resource -w "TYPE co ’ora’" -t 
what an intuitive command!!!


( alternative: "./crsctl stat resource" is less nice, but I'm having difficulties remembering the other one )
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS      
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       rac1                                        
               ONLINE  ONLINE       rac2                                        
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1                                        
               ONLINE  ONLINE       rac2                                        
ora.asm
               ONLINE  ONLINE       rac1                     Started            
               ONLINE  ONLINE       rac2                     Started            
ora.eons
               ONLINE  ONLINE       rac1                                        
               ONLINE  ONLINE       rac2                                        
ora.gsd
               OFFLINE OFFLINE      rac1                                        
               OFFLINE OFFLINE      rac2                                        
ora.net1.network
               ONLINE  ONLINE       rac1                                        
               ONLINE  ONLINE       rac2                                        
ora.ons
               ONLINE  ONLINE       rac1                                        
               ONLINE  ONLINE       rac2                                        
ora.registry.acfs
               ONLINE  ONLINE       rac1                                        
               ONLINE  ONLINE       rac2                                        
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac2                                        
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       rac1                                        
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       rac1                                        
ora.dtst.db
      1        ONLINE  ONLINE       rac1                     Open               
      2        ONLINE  ONLINE       rac2                     Open               
ora.oc4j
      1        OFFLINE OFFLINE                                                  
ora.rac1.vip
      1        ONLINE  ONLINE       rac1                                        
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                                        
ora.scan1.vip
      1        ONLINE  ONLINE       rac2                                        
ora.scan2.vip
      1        ONLINE  ONLINE       rac1                                        
ora.scan3.vip
      1        ONLINE  ONLINE       rac1        

Proper RAC shutdown

# ./crsctl stop cluster -all
"This command attempts to gracefully stop resources managed by Oracle Clusterware while attempting to stop the Oracle Clusterware stack."



Conclusion

Ok this was simplistic, but we do have our test RAC system working without any special hardware, using the iSCSI target from the previous post.

At this point we can backup the 2 Virtual machines (rac1, rac2), as well as the file used for the iSCSI disk. And experiment at will...





No comments:

Post a Comment