Wednesday, April 21, 2010

How to add device for SNMP Trap Monitoring in Nagios

SNMP Trap is pretty like syslog. It sends error messages to the Network Management System(NMS) like Nagios. Nagios doesn’t support SNMP trap by default. There is a Nagios plugin called SNMPTT that translate the received SNMP trap to the Nagios console. To install SNMPTT on Nagios, I used this guide "How to recieve SNMP Trap in Nagios". Afterwhich, you may follow the steps below to load additional SNMP MIBS trap for each managed device.

1) Load and compile MIBS to Nagios
This is the command to compile MIBS to Nagios server:
snmpttconvertmib --in= --out=/etc/snmp/snmptt.conf. --exec='/usr/local/nagios/libexec/eventhandlers/submit_check_result $r TRAP 1'

It would be tedious if there are too many MIBs files. Therefore, I wrote a simple bash script called “loadMIBS to compile all the MIBS in a folder.
if [ $# -ne 2 ]; then
echo "loadMIBS 'folder' 'device'"
exit 1
fi
for file in $( ls $1 ); do
/usr/sbin/snmpttconvertmib --in=$1\/$file \
--out=/etc/snmp/snmptt.conf.$2 \
--exec='/usr/local/nagios/libexec/eventhandlers/submit_check_result $r TRAP 1'
echo "MIBS loaded in /etc/snmp/snmptt.conf.$2"
done

2) Inform SNMPTT on the newly compiled Files
Modify /etc/snmp/snmptt.ini to include the earlier files:

[TrapFiles]
snmptt_conf_files =
/etc/snmp/snmptt.conf.devicename1
/etc/snmp/snmptt.conf.devicename2

END

3) Add the new Device to Nagios configuration file
I have created a standard file to consolidate all SNMP Trap devices at /usr/local/nagios/etc/objects/snmptrap.cfg. Just follow the example below:

define host{
use windows-server ; Inherit default values from a template
host_name HostA
alias HostA

address xx.xx.xx.xx ; IP address of the host
}

define host{
use windows-server ; Inherit default values from a template
host_name HostB
alias HostB
address xx.xx.xx.xx ; IP address of the host
}

define hostgroup{
hostgroup_name snmp_group ; The name of the hostgroup
alias SNMP TRAP
members HostA, HostB
}

define service{
hostgroup_name snmp_group
use snmptrap-service
contact_groups netadmin ; Who to alert & contact
}

4) Define New TRAP service on Nagios
Separately, on the templates.cfg, I have added this SNMP trap service
# define snmp trap service for network
define service{
use generic-service
name snmptrap-service
check_command check-host-alive
service_description TRAP
passive_checks_enabled 1
register 0
is_volatile 1
check_period none
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
notification_interval 31536000
notification_options w
}

Red: Make sure that the service description must match the submit_check_result parameter i.e. TRAP in this case. Otherwise, Nagios won't be able to match the recieved snmp trap to the passive service.

5) Verifying New SNMP Trap Service

Restart Nagios service and generate a test snmp trap from your managed device. If you do not receive an alert (email and/or sms), do the following:
  • Check that the snmp trap daemon is running i.e. ps -e | grep trap
  • Check the snmptt log that the trap is received
  • Click on the "Event Logs" of Nagios admin console. Check that the event handler "submit_check_result" is executed correctly.

Saturday, April 17, 2010

Redirect network traffic (ICMP redirect)

In some legacy ethernet LANs, you may encounter a flat network with a huge subnet with hundreds or even thousands of PCs on it. As network grows in complexity, more gateways are added to link this LAN to more external networks. For most PCs, you would expect that only default route exists on them. How do the PCs able to send traffics to external networks without adding static routes on them? This is a classic example from Cisco, which used ICMP redirect.

For example, the two routers R1 and R2 are connected to the same Ethernet segment as Host H. The default gateway for Host H is configured to use router R1. Host H sends a packet to router R1 to reach the destination on Remote Branch office Host 10.1.1.1. Router R1, after it consults its routing table, finds that the next-hop to reach Host 10.1.1.1 is router R2. Now router R1 must forward the packet out the same Ethernet interface on which it was received. Router R1 forwards the packet to router R2 and also sends an ICMP redirect message to Host H. This informs the host that the best route to reach Host 10.1.1.1 is by way of router R2. Host H then forwards all the subsequent packets destined for Host 10.1.1.1 to router R2.



ICMP redirect is enabled on most Cisco routers by default. However, it is disabled by Cisco security devices (e.g. PIX/ASA) by default. To permit same interface redirect or icmp redirect, issue this command: same-security-traffic permit inter-interface

Friday, April 2, 2010

Failover Clustering with StarWind iSCSI

Recently, I attended a well-known iSCSI SAN vendor seminar. The product is good and it provides all kind of storage virtualization (except deduplication), including replication, thin provisioning etc. The main selling point is the frameless architecture that is scalable and you can manage the whole lots of their SAN boxes as one virtual instance. The main drawback is pricing and it can't inter-operate with other storage solutions. Hence, there is a potential vendor lock-in.

I recalled some MVP speaker in Las Vegas introduced a software iSCSI target called "StarWind iSCSI". I decided to give it a try and set it up at my home network. The setup looks like this:


To prevent single box failure, I mirrored 2 virtual volumes across both StarWind servers (which are now iSCSI SAN boxes. Joining them to domain would make administration even easier). Creating the virtual HA volumes and exporting them as iSCSI targets is easy with StarWind with this step-by-step guide from the vendor.

Next, I setup 2-node failover cluster and both nodes are able to connect to the iSCSI targets with MPIO. I added file service to the cluster with a large musical video file that I captured in Las Vegas. I mapped a drive from the client PC on the public net and start playing it. During the play, I purposely shut down StarWind1 server (which is the source target). The video paused a few seconds before the partner (StarWind2) took over. I'm impressed.

Thursday, April 1, 2010

Storage Virtualization

As we are implementing Microsoft virtualization, more and more storage space are being used up rapidly. Another issue is storage availability. As we cluster up more VM hosts, major single points of failure still remain on the shared cluster storage. Even if storage can be fully replicated within a single site, any site-wide disaster (like flood, fire etc) can wipe off any data shortly. This is where storage virtualization comes into the picture. Wiki defines storage virtualization as the abstraction (separation) of logical storage from physical storage.

Let's take a look at the jargon used and how they can help solve the above issues, mainly on over-provisioning (that lead to high costs) & availability/DR related issues.
  • RAID: Some said RAID is the earliest form of storage virtualization, as a logical volume can span across multiple disks to prevent single disk failure.
  • I/O Multipathing (MPIO): In the event that one or more of these components fails, causing the path to fail, multipathing logic uses an alternate path for I/O so that the servers & applications can still access their data.
  • Remote synchronization: To eliminate storage as single point of failure, data across two separately located storage are replicated over the network on a per volume basis. It presents a single logical volume to the servers, although it may span across different storage boxes. It is also essential to implement multi-site failover clustering for Windows 2008 servers.
  • Thin provisioning: It is easier and less troublesome to extend a volume rather than shrinking it. Hence, most administrators tend to over-provision storage space for applications. To reduce wastage, thin provisioning allows administrators to provision a large volume but only a small fraction is actually allocated until the applications occupy more space over time.
  • Thin replication: You replicate a "thinly" provisioned volume. Only delta changes will be replicated across and save network bandwidth.
  • Point in time Snapshot: To simplify data restoration & DR recovery, periodic snapshots on the storage are taken over time. It allows you to rollback data to certain points in time.
  • Deduplication: When you implement server virtualization or VDI, most of the bits and bytes of the VHDs are identical. Deduplication further optimizes storage space by removing duplicated bits & bytes.