philtortoise System/DBA+: October 2012

Sunday, October 28, 2012

Test HA 3: Oracle RAC setup

Following the setup of an iSCSI shared disk available to 2 VMs, this time I am installing a test Oracle RAC
- Again without any specific hardware (everything runs on a simple PC)

Interconnect network:

In RAC parlance it is the private net between the nodes (==heartbeat net)
On the virtual machine I add a 'Host-only network' interface.
In my case I took the 192.168.160.0/24 network, I changed the DHCP to static as follows: (using system-config-network for example)

RAC1: eth1 192.168.160.101 / 255.255.255.0
RAC2: eth1 192.168.160.101 / 255.255.255.0

#service network restart
RAC1 $ ping 192.168.160.102
OK

Users & system setup

Reference: Database Installation Guide for Linux

On each node:

# yum install gcc elfutils-libelf-devel glibc-devel libaio-devel libstdc++-devel unixODBC unixODBC-devel gcc-c++

# groupadd dba
# groupadd oinstall
# useradd -m oracle -g oinstall -G dba,asmdba
# passwd oracle

# cat >> /etc/security/limits.conf
oracle          hard    nofiles         65536
oracle          soft    nofiles         65536

# cat >> /etc/sysctl.conf
kernel.sem = 250        32000   100      128
fs.file-max = 6815744
net.ipv4.ip_local_port_range = 9000    65500
net.core.rmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 4194304
fs.aio-max-nr = 1048576

# /sbin/sysctl -p

# mkdir /opt/oracle
# chown oracle.oinstall /opt/oracle/

ssh connection without password

RAC1# su - oracle

RAC1$ ssh-keygen -b 2048

(type enter for an empty passphrase)

Repeat on RAC2 (this creates .ssh directory and private/pub keys)

RAC1$ scp .ssh/id_rsa.pub rac2:/home/oracle/.ssh/authorized_keys

RAC2$ chmod 600 /home/oracle/.ssh/authorized_keys

RAC2$ scp .ssh/id_rsa.pub rac1:/home/oracle/.ssh/authorized_keys

RAC1$ chmod 600 /home/oracle/.ssh/authorized_keys

Then ssh works without password from one node to the other

RAC1$ ssh RAC2

OUI (Oracle Installer) also needs login from itself, so we also need on each node our own public key:

$ cd ~/.ssh && cat id_rsa.pub >> authorized_keys

Choose some config names and IPs

in my case:

cluster name= raccluster
public hostname1 = rac1             192.168.0.201
public hostname2 = rac2             192.168.0.202
virtual hostname1 = rac1-vip       192.168.0.211
virtual hostname2 = rac2-vip       192.168.0.212

virtual IP:                 racvip          192.168.0.203
SCAN addresses:     rac-scan 192.168.0.213 192.168.0.214 192.168.0.215
(defined though the DNS, see my DNS post if like me you forgot...)

vi /etc/nsswitch.conf

hosts: dns files

# service nscd restart

Created some directories:

# mkdir -p /u01/app/11.2.0/grid

# chown -R oracle:oinstall /u01/app/11.2.0/grid

# mkdir -p /u01/app/oracle/

# chown -R oracle:oinstall /u01/app/oracle/

# chmod -R 775 /u01/app/oracle/

NTPD:

ntpd is needed and need special slewing option:

# vi /etc/sysconfig/ntpd 

    
OPTIONS="-x -u ntp:ntp -p /var/run/ntpd.pid"

# service ntpd restart 

# chkconfig ntpd on

Setup ASMLib

In this example we use ASM (alternatives are: ocfs2, GFS..)
We insall ASMlib which is just the lower level software (kernel driver and low level utils). The rest of ASM is installed through the 'grid'

rac1 & rac2:

wget http://oss.oracle.com/projects/oracleasm/dist/files/RPMS/rhel5/x86/2.0.5/2.6.18-238.el5/oracleasm-2.6.18-238.el5-2.0.5-1.el5.i686.rpm

wget http://oss.oracle.com/projects/oracleasm-support/dist/files/RPMS/rhel5/x86/2.1.7/oracleasm-support-2.1.7-1.el5.i386.rpm
wget http://download.oracle.com/otn_software/asmlib/oracleasmlib-2.0.4-1.el5.i386.rpm

rpm -i oracleasm-support-2.1.7-1.el5.i386.rpm oracleasmlib-2.0.4-1.el5.i386.rpm oracleasm-2.6.18-238.el5-2.0.5-1.el5.i686.rpm

ASMlib configuration: (note the documentation is missing the '-i' option)

# /usr/sbin/oracleasm configure -i

Default user to own the driver interface []: oracle
Default group to own the driver interface []: dba
Start Oracle ASM library driver on boot (y/n) [n]: y
Scan for Oracle ASM disks on boot (y/n) [y]: y
Writing Oracle ASM library driver configuration: done

rac1# /usr/sbin/oracleasm createdisk ASMDISK1 /dev/sdd1

rac1# /usr/sbin/oracleasm listdisks
ASMDISK1

rac2# /usr/sbin/oracleasm scandisks
rac2# /usr/sbin/oracleasm listdisks
ASMDISK1

I can the ASM disk on both nodes. Good !

Grid Installation

The grid software contains ASM and Oracle Clusterware.
In this test setup I used the same 'oracle' user (with hindsight I should have used 'grid', much cleaner to separate the grid/clusterware from DB itself)

export ORACLE_BASE=/u01/app/oracle/

export ORACLE_HOME=/u01/app/11.2.0/grid

./runInstaller

I met this error: [INS-40910] Virtual IP: entered is invalid.
misleading in my case it was due to bad reverse DNS resolution...

run the root script, which started a bunch of stuff and used the ASM disk

 ohasd is starting

CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'

CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'

CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'

CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'

CRS-2672: Attempting to start 'ora.cssd' on 'rac1'

CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'

CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'

CRS-2672: Attempting to start 'ora.crsd' on 'rac1'

CRS-2672: Attempting to start 'ora.evmd' on 'rac1'

CRS-2672: Attempting to start 'ora.asm' on 'rac1'

CRS-2672: Attempting to start 'ora.DATA.dg' on 'rac1'

CRS-2672: Attempting to start 'ora.registry.acfs' on 'rac1'

Verifications:

$ ./crsctl check cluster -all
**************************************************************
rac1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
rac2:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************

Oracle processes after clusterware + ASM (grid) install

Oh man, Oracle it works but it is not really lightweight... We haven't installed any real DB yet !

root      8521     1 0 07:47 ?        00:00:00 /bin/sh /etc/init.d/init.ohasd run
root      8544     1 0 07:47 ?        00:00:04 /u01/app/11.2.0/grid/bin/ohasd.bin reboot
root      9842     1 0 07:48 ?        00:00:01 /u01/app/11.2.0/grid/bin/orarootagent.bin
oracle   10624     1 0 07:51 ?        00:00:03 /u01/app/11.2.0/grid/bin/oraagent.bin
oracle   10639     1 0 07:51 ?        00:00:00 /u01/app/11.2.0/grid/bin/mdnsd.bin
oracle   10651     1 0 07:51 ?        00:00:00 /u01/app/11.2.0/grid/bin/gipcd.bin
oracle   10662     1 0 07:51 ?        00:00:01 /u01/app/11.2.0/grid/bin/gpnpd.bin
root     10677     1 0 07:51 ?        00:00:01 /u01/app/11.2.0/grid/bin/cssdmonitor
root     10694     1 0 07:51 ?        00:00:01 /u01/app/11.2.0/grid/bin/cssdagent
oracle   10696     1 0 07:51 ?        00:00:00 /u01/app/11.2.0/grid/bin/diskmon.bin -d -f
oracle   10715     1 0 07:51 ?        00:00:03 /u01/app/11.2.0/grid/bin/ocssd.bin
root     10792     1 0 07:52 ?        00:00:00 /u01/app/11.2.0/grid/bin/octssd.bin
oracle   10852     1 0 07:52 ?        00:00:00 asm_pmon_+ASM1
oracle   10854     1 0 07:52 ?        00:00:00 asm_vktm_+ASM1
oracle   10858     1 0 07:52 ?        00:00:00 asm_gen0_+ASM1
oracle   10860     1 0 07:52 ?        00:00:00 asm_diag_+ASM1
oracle   10862     1 0 07:52 ?        00:00:00 asm_ping_+ASM1
oracle   10864     1 0 07:52 ?        00:00:00 asm_psp0_+ASM1
oracle   10866     1 0 07:52 ?        00:00:00 asm_dia0_+ASM1
oracle   10868     1 0 07:52 ?        00:00:00 asm_lmon_+ASM1
oracle   10870     1 0 07:52 ?        00:00:00 asm_lmd0_+ASM1
oracle   10873     1 0 07:52 ?        00:00:00 asm_lms0_+ASM1
oracle   10877     1 0 07:52 ?        00:00:00 asm_lmhb_+ASM1
oracle   10879     1 0 07:52 ?        00:00:00 asm_mman_+ASM1
oracle   10881     1 0 07:52 ?        00:00:00 asm_dbw0_+ASM1
oracle   10883     1 0 07:52 ?        00:00:00 asm_lgwr_+ASM1
oracle   10885     1 0 07:52 ?        00:00:00 asm_ckpt_+ASM1
oracle   10887     1 0 07:52 ?        00:00:00 asm_smon_+ASM1
oracle   10889     1 0 07:52 ?        00:00:00 asm_rbal_+ASM1
oracle   10891     1 0 07:52 ?        00:00:00 asm_gmon_+ASM1
oracle   10893     1 0 07:52 ?        00:00:00 asm_mmon_+ASM1
oracle   10895     1 0 07:52 ?        00:00:00 asm_mmnl_+ASM1
oracle   10897     1 0 07:52 ?        00:00:00 /u01/app/11.2.0/grid/bin/oclskd.bin
oracle   10900     1 0 07:52 ?        00:00:00 asm_lck0_+ASM1
root     10912     1 0 07:52 ?        00:00:08 /u01/app/11.2.0/grid/bin/crsd.bin reboot
oracle   10928     1 0 07:52 ?        00:00:01 /u01/app/11.2.0/grid/bin/evmd.bin
oracle   10930     1 0 07:52 ?        00:00:00 asm_asmb_+ASM1
oracle   10932     1 0 07:52 ?        00:00:00 oracle+ASM1_asmb_+asm1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
root     10958     1 0 07:52 ?        00:00:00 /u01/app/11.2.0/grid/bin/oclskd.bin
oracle   10960     1 0 07:52 ?        00:00:01 oracle+ASM1_ocr (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle   11017 10928 0 07:52 ?        00:00:00 /u01/app/11.2.0/grid/bin/evmlogger.bin -o /u01/app/11.2.0/grid/evm/log/evmlogger.info -l /u01/app/11.2.0/grid/evm/log/ev
mlogger.log
oracle   11220     1 0 07:53 ?        00:00:02 /u01/app/11.2.0/grid/bin/oraagent.bin
root     11388     1 0 07:53 ?        00:00:11 /u01/app/11.2.0/grid/bin/orarootagent.bin
oracle   11415     1 0 07:53 ?        00:00:00 /u01/app/11.2.0/grid/opmn/bin/ons -d
oracle   11416 11415 0 07:53 ?        00:00:00 /u01/app/11.2.0/grid/opmn/bin/ons -d
oracle   11467     1 0 07:53 ?        00:00:03 /u01/app/11.2.0/grid/jdk/jre//bin/java -Doracle.supercluster.cluster.server=eonsd -Djava.net.preferIPv4Stack=true -Djava
.util.logging.config.file=/u01/app/11.2.0/grid/srvm/admin/logging.properties -classpath /u01/app/11.2.0/grid/jdk/jre//lib/rt.jar:/u01/app/11.2.0/grid/jlib/srvm.jar:/u01
/app/11.2.0/grid/jlib/srvmhas.jar:/u01/app/11.2.0/grid/jlib/supercluster.jar:/u01/app/11.2.0/grid/jlib/supercluster-common.jar:/u01/app/11.2.0/grid/ons/lib/ons.jar orac
le.supercluster.impl.cluster.EONSServerImpl
oracle   11609     1 0 07:53 ?        00:00:00 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN2 -inherit
oracle   11620     1 0 07:54 ?        00:00:00 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER_SCAN3 -inherit
oracle   12474     1 0 08:05 ?        00:00:00 /u01/app/11.2.0/grid/bin/tnslsnr LISTENER -inherit

ASM peek:

With: export ORACLE_SID="+ASM1" and PATH to OH/bin

$ asmcmd

ASMCMD> ls

DATA/

ASMCMD> du

Used_MB      Mirror_used_MB

    263                 263

...

Database install

Again with same 'oracle' user, but at a different 'home'

$ export ORACLE_BASE=/u01/app/oracle/

$ export ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1

$ ./runInstaller

-Create & configure
-Server Class
-RAC type
-Typical
-Storage type ASM, and location on DATA

-Global name: DTST

went Ok except
problem1)
ora-845 memory_target not supported (actually not enough shm!)
added to the fstab:
shmfs /dev/shm tmpfs size=1200m 0

and did it manually:
# mount -t tmpfs shmfs -o size=1200m /dev/shm

problem2)
strange error
CRS-5804: Communication error with agent process
CRS-2632: There are no more servers to try to place resource 'ora.dtst.db' on that would satisfy its placement policy

RAC2$ ./srvctl status database -d DTST

Instance DTST1 is running on node rac1

Instance DTST2 is not running on node rac2

Tried to restart, expecting to see the error...

$ ./srvctl stop database -d DTST

$ ./srvctl start database -d DTST

$ ./srvctl status database -d DTST

Instance DTST1 is running on node rac1

Instance DTST2 is running on node rac2

but it went OK this time, should have investigated this, but skipped for now...

Verifications

Documentation suggests this:

$ cd /u01/app/11.2.0/grid/bin
$ ./crsctl status resource -w "TYPE co ’ora’" -t
what an intuitive command!!!

( alternative: "./crsctl stat resource" is less nice, but I'm having difficulties remembering the other one )

--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.asm
               ONLINE  ONLINE       rac1                     Started             
               ONLINE  ONLINE       rac2                     Started             
ora.eons
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.gsd
               OFFLINE OFFLINE      rac1                                         
               OFFLINE OFFLINE      rac2                                         
ora.net1.network
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.ons
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
ora.registry.acfs
               ONLINE  ONLINE       rac1                                         
               ONLINE  ONLINE       rac2                                         
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac2                                         
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       rac1                                         
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       rac1                                         
ora.dtst.db
      1        ONLINE  ONLINE       rac1                     Open                
      2        ONLINE  ONLINE       rac2                     Open                
ora.oc4j
      1        OFFLINE OFFLINE                                                   
ora.rac1.vip
      1        ONLINE  ONLINE       rac1                                         
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                                         
ora.scan1.vip
      1        ONLINE  ONLINE       rac2                                         
ora.scan2.vip
      1        ONLINE  ONLINE       rac1                                         
ora.scan3.vip
      1        ONLINE  ONLINE       rac1         

Proper RAC shutdown

# ./crsctl stop cluster -all

"This command attempts to gracefully stop resources managed by Oracle Clusterware while attempting to stop the Oracle Clusterware stack."

Conclusion

Ok this was simplistic, but we do have our test RAC system working without any special hardware, using the iSCSI target from the previous post.

At this point we can backup the 2 Virtual machines (rac1, rac2), as well as the file used for the iSCSI disk. And experiment at will...

Saturday, October 27, 2012

Test HA 2: Veritas cluster/Oracle setup

Following our test iSCSI setup without hardware, here is an example of typical VCS/Oracle fail-over setup. This is on RHEL5

setup heartbeat network

You need 2 more virtual network cards on node1 and node2, preferably on separate logical networks:
If needed: re-run the vmware config (/usr/bin/vmware-config.pl) to create 2 local 'host-only' subnets 192.168.130.0 and 192.168.131.0 because I suspect LLT may not work on the same bridged network
Then add 2 more network cards in each VM

Assign the addresses (system-config-network). In our example we will use:

node1:
192.168.130.160/24 (eth1)
192.168.131.160/24 (eth2)

node2:
192.168.130.161/24 (eth1)
192.168.131.161/24 (eth2)

run system-config-network and setup accordingly, then '/etc/init.d/network restart'

from node1 perform basic tests:
ping 192.168.130.161

ping 192.168.131.161

VCS prerequisites

note this is for VCS5.1 on RHEL5.5 check the install manual

# yum install compat-libgcc compat-libstdc++ glibc-2.5 libgcc glibc libgcc libstdc++ java-1.4.2

append to /etc/hosts, for easier admin, on each node:
192.168.0.201 node1
192.168.0.202 node2

ssh keys:

ssh-keygen -t dsa (on each node)

node1# scp /root/.ssh/id_dsa.pub node2:/root/.ssh/authorized_keys2

 node2# scp /root/.ssh/id_dsa.pub node1:/root/.ssh/authorized_keys2

Verify you can connect without password from node1 to node2, and the other way around

node1# ssh node2

Update .bash_profile

PATH=/opt/VRTS/bin:$PATH; export PATH

MANPATH=/usr/share/man:/opt/VRTS/man; export MANPATH

kernel panic
sysctl -w kernel.panic=10

precheck (from the VCS cd, or tar.gz extracted):

./installvcs -precheck node1 node2

VCS install

 ./installvcs

choices:

I : install

1) Veritas Cluster Server (VCS)

3) Install all Veritas Cluster Server rpms - 322 MB required

Enter the 64 bit RHEL5 system names separated by spaces: [q,?] node1 node2

(enter license key or 60days without)

Would you like to configure VCS on node1 node2 [y,n,q] (n) y

Enter the unique cluster name: [q,?] vmclu160
Enter a unique Cluster ID number between 0-65535: [b,q,?] (0) 160

Enter the NIC for the first private heartbeat link on node1: [b,q,?] eth1
eth1 has an IP address configured on it. It could be a public NIC on node1.
Are you sure you want to use eth1 for the first private heartbeat link?
[y,n,q,b,?] (n) y
Is eth1 a bonded NIC? [y,n,q] (n)
Would you like to configure a second private heartbeat link? [y,n,q,b,?] (y)
Enter the NIC for the second private heartbeat link on node1: [b,q,?] eth2
eth2 has an IP address configured on it. It could be a public NIC on node1.
Are you sure you want to use eth2 for the second private heartbeat link?
[y,n,q,b,?] (n) y
Is eth2 a bonded NIC? [y,n,q] (n)
Would you like to configure a third private heartbeat link? [y,n,q,b,?] (n) n

Do you want to configure an additional low priority heartbeat link?
[y,n,q,b,?] (n) y
Enter the NIC for the low priority heartbeat link on node1: [b,q,?] (eth0)
Is eth0 a bonded NIC? [y,n,q] (n)
Are you using the same NICs for private heartbeat links on all systems?
[y,n,q,b,?] (y) y

Cluster information verification:
        Cluster Name:      vmclu160
        Cluster ID Number: 160
        Private Heartbeat NICs for node1:
                link1=eth1
                link2=eth2
        Low Priority Heartbeat NIC for node1: link-lowpri=eth0
        Private Heartbeat NICs for node2:
                link1=eth1
                link2=eth2
        Low Priority Heartbeat NIC for node2: link-lowpri=eth0
Is this information correct? [y,n,q,b,?] (y) y

Virtual IP can be specified in RemoteGroup resource, and can be used to
connect to the cluster using Java GUI
The following data is required to configure the Virtual IP of the Cluster:
        A public NIC used by each system in the cluster
        A Virtual IP address and netmask
Do you want to configure the Virtual IP? [y,n,q,?] (n) y
Active NIC devices discovered on node1: eth0 eth1 eth2
Enter the NIC for Virtual IP of the Cluster to use on node1: [b,q,?] (eth0)
Is eth0 to be the public NIC used by all systems? [y,n,q,b,?] (y)
Enter the Virtual IP address for the Cluster: [b,q,?] 192.168.0.203
Enter the NetMask for IP 192.168.0.203: [b,q,?] (255.255.255.0)
Would you like to configure VCS to use Symantec Security Services? [y,n,q] (n)

Do you want to set the username and/or password for the Admin user
(default username = 'admin', password='password')? [y,n,q] (n) y
Enter the user name: [b,q,?] (admin)
Enter the password:
Enter again:
Do you want to add another user to the cluster? [y,n,q] (n) n

For this test setup, answer n to SMTP and Global cluster and let it restart

node1# hastatus -sum

 -- SYSTEM STATE

 -- System               State                Frozen

 A  node1                RUNNING              0

 A  node2                RUNNING              0

 -- GROUP STATE

 -- Group           System               Probed     AutoDisabled    State        

 B  ClusterService  node1                Y          N               ONLINE       

 B  ClusterService  node2                Y          N               OFFLINE

Oracle binary install

Reference: Database Installation Guide for Linux

On each node:

# yum install gcc elfutils-libelf-devel glibc-devel libaio-devel libstdc++-devel unixODBC unixODBC-devel gcc-c++

# groupadd dba

# groupadd oinstall

# useradd -m oracle -g oinstall -G dba,asmdba

# passwd oracle

# cat >> /etc/security/limits.conf

 oracle          hard    nofiles         65536

 oracle          soft    nofiles         65536

For RHEL5.5:

 # cat >>  /etc/sysctl.conf

 kernel.sem = 250        32000   100      128

 fs.file-max = 6815744

 net.ipv4.ip_local_port_range = 9000    65500

 net.core.rmem_default = 262144

 net.core.rmem_max = 4194304

 net.core.wmem_default = 262144

 net.core.wmem_max = 4194304

 fs.aio-max-nr = 1048576

# /sbin/sysctl -p

 # mkdir /opt/oracle

 # chown oracle.oinstall /opt/oracle/

As ORACLE user:
On this example setup, Oracle binaries are installed on both nodes, in /opt/oracle
extract the Oracle distrib 11gr2, ensure you have an X connection and run:
$ ./runInstaller

install database software only
single instance database installation
enterprise
(select options -> only partitioning)
oracle base=/opt/oracle/app/oracle
sw loc=/opt/oracle/app/oracle/product/11.2.0/dbhome_1
leave other defaults

$ cat >> ~/.bash_profile

 export ORACLE_HOME=/opt/oracle/app/oracle/product/11.2.0/dbhome_1

 export PATH=$ORACLE_HOME/bin:$PATH

Oracle instance install

mount the shared disk on /database (as root)

# mkdir /database

# chown oracle.dba /database (do these on both nodes)

# mount /dev/sdd1 /database/ (do this only on node1, to create the instance)

create the instance with dbca (as oracle)

$ dbca

create a test database, especially set:

Use common location for all database files: /database

$ export ORACLE_SID=DTST (and add this to oracle .bash_profile on both nodes)

$ sqlplus "/ as sysdba"

 SQL> select * from dual;

 SQL> shutdown immediate

copy spfile to the other node (as oracle):

$ scp /opt/oracle/app/oracle/product/11.2.0/dbhome_1/dbs/spfileDTST.ora node2:/opt/oracle/app/oracle/product/11.2.0/dbhome_1/dbs

copy also the directory structure created for audit logs:

$ scp -r /opt/oracle/app/oracle/admin/DTST node2:/opt/oracle/app/oracle/admin/DTST

it seems the /opt/oracle/app/oracle/diag/rdbms/dtst strcuture for traces etc.. is created automatically)

set $ORACLE_HOME/network/admin/listener.ora with the virtual IP we will use, here : 192.168.0.204

LISTENER =

  (DESCRIPTION_LIST =

    (DESCRIPTION =

      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.0.204)(PORT = 1521))

)

)

 ADR_BASE_LISTENER = /opt/oracle/app/oracle

 SID_LIST_LISTENER =

  (SID_LIST =

    (SID_DESC =

      (GLOBAL_DBNAME = DTST)

      (ORACLE_HOME =/opt/oracle/app/oracle/product/11.2.0/dbhome_1)

      (SID_NAME = DTST)

)

)

and tnsnames.ora:

 DTST =

  (DESCRIPTION =

    (ADDRESS_LIST =

      (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.0.204)(PORT = 1521))

)

    (CONNECT_DATA =

      (SERVICE_NAME = DTST)

)

)

copy both files to the other node (as oracle):

$ scp /opt/oracle/app/oracle/product/11.2.0/dbhome_1/network/admin/*.ora node2:/opt/oracle/app/oracle/product/11.2.0/dbhome_1/network/admin

VCS service group config for Oracle

umount shared disk

# umount /database

update /etc/VRTSvcs/conf/config/main.cf and add the following service group:

 group OraGroup (

        SystemList = { node1 = 0, node2 = 1 }

        AutoStartList = { node1, node2 }

)

        DiskReservation DR_ora (

                Disks @node1 = { "/dev/sdd" }

                Disks @node2 = { "/dev/sdd" }

                FailFast = 1

)

        Mount Mount_oraprod_dfiles (

                MountPoint = "/database"

                BlockDevice = "/dev/sdd1"

                FSType = ext3

                FsckOpt = "-n"

)

        IP IP_oraprod (

                Device = eth0

                Address = "192.168.0.204"

                NetMask = "255.255.250.0"

)

        NIC NIC_oraprod (

                Device = eth0

)

        Netlsnr LSNR_oraprod_lsnr (

                Owner = oracle

                Home = "/opt/oracle/app/oracle/product/11.2.0/dbhome_1"

                TnsAdmin = "/opt/oracle/app/oracle/product/11.2.0/dbhome_1/network/admin"

                Listener = LISTENER

)

        Oracle ORA_oraprod (

                Sid =DTST

                Owner = oracle

                Home = "/opt/oracle/app/oracle/product/11.2.0/dbhome_1"

                Pfile = "/opt/oracle/app/oracle/product/11.2.0/dbhome_1/dbs/initDTST.ora"

                StartUpOpt = STARTUP

)

        IP_oraprod requires NIC_oraprod

        LSNR_oraprod_lsnr requires IP_oraprod

        LSNR_oraprod_lsnr requires ORA_oraprod

        Mount_oraprod_dfiles requires DR_ora

        ORA_oraprod requires Mount_oraprod_dfiles

note: this is a simple setup: no Veritas Volume,no DetailMonitoring of the DB (ie hangs = no failure detection)

check:

# hacf -verify /etc/VRTSvcs/conf/config/

stop/start the cluster to re-read the main.cf (in prod we could use hacf -cftocmd .../config/ and run main.cmd etc.)

# hastop -all

# hastart

# hastart (on node2)

verify log while starting:

# tail -f /var/VRTSvcs/log/engine_A.log

# hastatus -sum

Failover quick test

Simulate a problem on oracle

$ su - oracle

$ sqlplus "/ as sysdba"

 SQL> shutdown immediate

and verify if fail-over works.

connect for real to the database on node2 and check it is OPEN

SQL> select * from dual;
D
-
X

SQL> select STATUS from V$instance;
STATUS
------------
OPEN

Of course we have also to test with an external client, using the VIP

Conclusion:

We have a working test cluster without special hardware, using the iSCSI target from the previous post.
At this point we can backup the 2 Virtual machines (rac1, rac2), as well as the file used for the iSCSI disk. And experiment at will...

Friday, October 26, 2012

Test HA (cluster,RAC...) without shared disk array

In my job I work quite often with HA: Veritas cluster, linux cluster, Oracle RAC...

But I don't always have a shared disk array available for tests, these are usually too expensive (10K+) to buy just for sandbox testing.

As a result people lack practise with these HA setups, nobody dares to touch them, and nobody is confident when problems arise!

My solution is to use any host or VM as iSCSI target to act as a shared array, and use it to build the whole HA setup:

Fake "shared array" configuration

Install "target" packages:

Use any host or VM running Linux. All the necessary package are available in popular distributions:

# yum install scsi-target-utils (red hat and similar)

# apt-get install tgt (debian/ubuntu)

Create a fake device

If you don't have a partiton handy, create a large file which will act as a device:

# dd if=/dev/zero of=/sharedarray/device1.bin bs=1024k count=1000

Make it available through iSCSI:

Edit /etc/tgt/targets.conf to define this file/device available
Here you have to invent an IQN (iSCSI Qualified Name). For example I did:

# vi /etc/tgt/targets.conf

<target iqn.2012-09.com.philtortoise.example:testserver4.target1>

    backing-store /sharedarray/device1.bin

</target>

Restart the service:

# service tgt restart (/etc/init.d/tgt restart in older ubuntu version)

Verify something listens on port 3260 (standard iSCSI port)

# netstat -plnt | grep 3260

tcp        0      0 0.0.0.0:3260            0.0.0.0:*               LISTEN      6924/tgtd       

tcp6       0      0 :::3260                 :::*                    LISTEN      6924/tgtd    

Here we have our test shared array! Of course no RAID here, nor redundant network access but this is enough for testing, even fail-over etc...
Next, let's connect the nodes...

Connect a first node

Install "initiator" packages:

the initiator is the "client" in iSCSI parlance.

# yum install iscsi-initiator-utils (red hat)
# apt-get install open-iscsi-utils open-iscsi (ubuntu - one for iscsiadm, one for the daemon)

Start and discover our array's device

# /etc/init.d/iscsi start (red hat)
# /etc/init.d/iscsi-network-interface start (ubuntu)

The following commands check what has been discovered, so far nothing:
# iscsiadm -m node
# iscsiadm -m discovery -P 1

Let's discover all disks available at our address:
#iscsiadm -m discoverydb -t sendtargets -p 192.168.0.4 --discover

OR we can do it like so, to specify the IQN:
# iscsiadm -m discovery -t sendtargets -p 192.168.0.4:3260
192.168.0.4:3260,1 iqn.2012-09.com.philtortoise.example:testserver4.target1

In all cases, now the info is persisted in /var/lib/iscsi, and visible with following the commands:
# iscsiadm -m node
192.168.0.4:3260,1 iqn.2012-09.com.philtortoise.example:testserver4.target1

# iscsiadm -m discovery -P 1
SENDTARGETS:
DiscoveryAddress: 192.168.0.4,3260
Target: iqn.2012-09.com.philtortoise.example:testserver4.target1
Portal: 192.168.0.4:3260,1
Iface Name: default
...

Log in to the device

We can now login/connect to the drive:

# iscsiadm -m node --targetname "iqn.2012-09.com.philtortoise.example:testserver4.target1" --portal "192.168.0.4:3260" --login

Logging in to [iface: default, target: iqn.2012-09.com.philtortoise.example:testserver4.target1, portal: 192.168.0.4,3260]

Login to [iface: default, target: iqn.2012-09.com.philtortoise.example:testserver4.target1, portal: 192.168.0.4,3260]: successful

Fdisk!

And now the disk is visible!:

# fdisk -l

...

Disk /dev/sdc doesn't contain a valid partition table

Disk /dev/sdd: 1048 MB, 1048576000 bytes
33 heads, 61 sectors/track, 1017 cylinders
Units = cylinders of 2013 * 512 = 1030656 bytes

Disk /dev/sdd doesn't contain a valid partition table

(and this survives reboot - with the default configuration on RHEL at least. Otherwise check the /etc/iscsid.conf and doc in /usr/share/doc/iscsi-initiator-utils-6.2.0.872/)

Second node

Repeat the above, and test through reboot.

Create a partition, and make sure it is visible on both sides

# fdisk /dev/sdd

Command (m for help): n

Command action

   e   extended

   p   primary partition (1-4)

p

Partition number (1-4): 1

First cylinder (1-1017, default 1): 

Using default value 1

Last cylinder or +size or +sizeM or +sizeK (1-1017, default 1017): +900M

Command (m for help): w

The partition table has been altered!

Calling ioctl() to re-read partition table.

Syncing disks.

Quick test

We can do a quick test, creating a filesystem and mounting it subsequently on both nodes:

# mkfs.ext3 /dev/sdd1
# mount /dev/sdb1 /mnt
# echo "hello world" > /mnt/helloworld.txt
# umount /dev/sdb1

Check we can mount and read it on the second node (Note: ext3 is not a clusterfs: make sure it is not mounted on node 1 anymore!)

If /dev/sdd1 is not seen at this point, you may need '/etc/init.d/iscsi restart' or reboot

# mount /dev/sdb1 /mnt
# cat /mnt/helloworld.txt
hello world

To be continued...

We now have the most costly piece of a typical HA setup.

In the next posts, I'll do a fail-over Oracle/Veritas cluster setup

and later a RAC setup

Additional notes, if needed to rework the iSCSI config

Logout:

either all nodes from a portal:

# iscsiadm -m node --logout -p 192.168.0.4,3260
Logging out of session [sid: 1, target: iqn.2012-09.com.philtortoise.example:testserver4.target1, portal: 192.168.0.4,3260]
Logout of [sid: 1, target: iqn.2012-09.com.philtortoise.example:testserver4.target1, portal: 192.168.0.4,3260] successful.

or specific IQN:

# iscsiadm -m node --logout iqn.2012-09.com.philtortoise.example:testserver4.target1
Logging out of session [sid: 1, target: iqn.2012-09.com.philtortoise.example:testserver4.target1, portal: 192.168.0.4,3260]
Logout of [sid: 1, target: iqn.2012-09.com.philtortoise.example:testserver4.target1, portal: 192.168.0.4,3260] successful.

# iscsiadm -m session
iscsiadm: No active sessions.

and 'fdisk' doesn't show the disk anymore

Delete the entries:

# iscsiadm -m node -o delete iqn.2012-09.com.philtortoise.example:testserver4.target1
# iscsiadm -m node
iscsiadm: no records found!

and /var/lib/iscsi has been cleaned a bit (at least the IQN is removed)

From there we can re-create a new config (Example: larger device)

Wednesday, October 24, 2012

bind / DNS quick start

DNS setup is again something I do once every few years (here I'm testing an Oracle RAC setup...).
So today I am writing down a short memo on a minimal setup to get started:

Example: Internal network 192.168.0.* , which I call 'philtortoise.com'
using 'bind9' on ubuntu 12.04

First, forward the requests to ISP (or Corporate top server). Here simply back to my router:

in /etc/bind/named.conf:

    forwarders {
         192.168.0.1;
    };

This forwards queries it cant answer to the name server at this IP . If no answers: will try root servers

( Note: If we don't want to try root servers -in corporate environment- add the line:     forward-only;  )

Then define the zones and associated resource record files (again in named.conf)

    zone "philtortoise.com" {

        type master;     
    file "/etc/bind/db.philtortoise.com";

    };      

    zone "168.192.0.in-addr.arpa" {

        type master; 

        file "/etc/bind/db.192.168.0";

    };

Then create the "RR" Resource Record files
This is running on machine 192.168.0.4 called "opti", and I have a couple hosts: opti, ngear (my router), rac1, rac2 (the RAC machines) etc..

in /etc/bind/db.philtortoise.com:

;

; resource record for my own local network on 192.168.0.0/24

;

$TTL    604800
@    IN    SOA    opti.philtortoise.com. philtortoise.gmail.com. (
             123362        ; Serial
             604800        ; Refresh
              86400        ; Retry
            2419200        ; Expire
             604800 )      ; Negative Cache TTL
;
@       IN      NS      opti.philtortoise.com.
opti    IN      A       192.168.0.4
ngear   IN      A       192.168.0.1
;
rac1    IN      A       192.168.0.201
rac2    IN      A       192.168.0.202
racvip  IN      A       192.168.0.203

docrac-scan IN  A       192.168.0.213
docrac-scan IN  A       192.168.0.214
docrac-scan IN  A       192.168.0.215

Explanation of some syntax:

$TTL  (T)ime (T)o (L)ive - How long to cache the resource record

@     abreviation for the domain (replaces: "philtortoise.com." )

SOA   (S)tart (O)f (A)uthority

      before: domain it's authoritative for, after: primary DNS, and e-mail (without @) 

      the number in () are to instruct slave(s) how frequently to check for new data

      the serial should be increased when there is new data

NS    Name Server: Which machine is NS (in prod, several listed)

      it needs an (A)dress specified later on.

A         Definition of an address ( AAAA for IPv6 )
              (We may define multiple addresses as for the docrac-scan entry
              Closest network is chosen, or round-robin)
IN    This mean simply "Internet" historically DNS has been used for other protocols.

Others:

CNAME (C)anonical NAME: a pointer from a name to another.
Typically used to redirect, say www.philtortoise.com to the current server doing the job

PTR   See below, pointer for reverse translation from address to name

MX    Mail exchange: points to mail server(s) for a domain, with priorities:

      hp.com.    IN    MX    0 mailserver1.it-usa.hp.com.

      hp.com.    IN    MX 1 postoffice1.it-eur.hp.com.

Tips to test:

# service bind9 stop

# /usr/sbin/named -g -d 3 -u bind

As it turns out:
'-g' is necessary to both be foreground and see all messages (force stderr).

And below debug level 3 (-d 3) I don't see all DNS requests!

Then I test with the 'host' command, which conveniently let choose the DNS IP on the command line:

# host rac1.philtortoise.com 192.168.0.4

Using domain server:
Name: 192.168.0.4
Address: 192.168.0.4#53
Aliases: 

rac1.philtortoise.com has address 192.168.0.201

Other tools:

nslookup rac1.philtortoise.com 192.168.0.4

dig...

Reverse lookup:

If needed to find back the name from the IP (many programs try to do that for logging, for example sshd, apache - depending of the conf)

We need to provide a file where the other way around is solved:

# cat /etc/bind/db.192.168.0
;

  ; BIND reverse data file for 192.268.0.*

  ;

  $TTL    604800

  @    IN    SOA    philtortoise.com. root.philtortoise.com. (

                    1        ; Serial

               604800        ; Refresh

                86400        ; Retry

              2419200        ; Expire

               604800 )      ; Negative Cache TTL

  ;

  @      IN    NS     localhost.

  1      IN    PTR    ngear.philtortoise.com.

  4      IN    PTR    opti.philtortoise.com.

  ;

  201    IN    PTR    rac1.philtortoise.com.

  202    IN    PTR    rac2.philtortoise.com.

  203    IN    PTR    racvip.philtortoise.com.

Test reverse lookup:
# host 192.168.0.202
202.0.168.192.in-addr.arpa domain name pointer rac2.philtortoise.com.

References:

As always, some versions may vary slightly. It's always better to look at the doc included with the package we're using!

/usr/share/doc/bind9

Also a good read:
O'Reilly "DNS and Bind"