AIX: MONITORING & PERFORMANCE TUNING

MONITORING & PERFORMANCE TUNING

Disk quota
It controls the use of disk space,
It is defined for indiviudal users of group,
It is maintained for each jfs.

Disk quota establishes limits based on the following parameters
-User's or group's soft limits,
-User's or group's hard limits,
-Quota grace period.

Soft limit- The number of 1 kB disk blocks or the number of files under which the user must remain.
Hard limit- Maximum amount of disk blocks or files the user can accumulate under the established disk quotas.

Quota grace period - This period allows the user to exceed the soft limit for a short period of time (the default value is one week).

If the user fails to reduce usage below the soft limit during the specified time, the system will interpret the soft limit as the maximum allocation allowed, & no further storage is allocated to the user.

Typically, only those filesystems that contain user home directories and files require disk quotas .

Consideration when implementing disk quota
-Your system has limited disk space
-You require more file security
-Your disk usage levels are large
(Apply disk quota when above conditions are true)

The disk quota system can be used only with the journaled filesystem.

Dont establish quota for /tmp , because many editors and system utilities create temporary files in the /tmp filesystem, it must be free of quotas.

The specified file systems must be defined with quotas in the /etc/filesystems file and must be mounted/remounted.
The quotaon command looks for quota.user and quota.group(default quota files) in the root directory of the associated filesystem.

Display quota
-->quota

Use the chfs command to include the userquota and groupquota configuration attributes in the /etc/filesystems file.
--> chfs -a "quota = userquota" /home
enable user quota on the /home filesystem.

--> chfs -a "quota=userquota, groupquota" /home
both user and group quotas are on for /home.

The related entry in /etc/filesystems
/home
dev = /dev/hd1
vfs = jfs
log = /dev/hd8
mount = true
check = true
quota = userquota, groupquota
options = rw

The quota.user & quota.group file names are the default names located at the root directories of filesystem.

To name userquota, myquota.user
and groupquota, myquota.group
-->chfs -a "userquota=/home/myquota.user"
-a "groupquota=/home/myquota.group" /home

Entry in /etc/filesystems
/home
dev = /dev/hd1
vfs = jfs
log = /dev/hd8
mount = true
check = true
quota = userquota, groupquota
userquota = /home/myquota.user
groupquota = /home/myquota.group
options = rw

To duplicate the quotas established for user joey on to user ross
-->edquota -p joey ross

To enable quotacheck and turn on quotas during system startup, add in /etc/rc
-->vi /etc/rc
echo " Enabling filesystem quotas"
/usr/sbin/quotacheck -a
/usr/sbin/quotaon -a

To enable user quotas for the /usr/Tivoli/server/db filesystem
-->quotaon -u /usr/Tivoli/server/db

Disable user and group quotas for all filesystems in the /etc/filesystems file
-->quotaoff -v -a

To display your quotas as user joey
-->quota joey

Display quotas as the root user for user ross
-->quota -u ross

B)Recovering from a full filesystem

1)Fix a full / (root) filesystem

a)Use who command to read the contents of the /etc/security/failedlogin
--> who /etc/security/failedlogin

b) The condition of TTYs respawning too rapidly can create failed login entries.
To clear the file after reading or saving the output,execute
-->cp /dev/null /etc/security/failedlogin

C)Check the /dev directory for a device name that is typed incorrectly . If rmto is created instead of rmt0, a file will be created in /dev called rmto.
Command will proceed until the / is filled , because /dev is part of / filesystem.
--> ls -l | pg
-look for the entries that are not valid, that do not have a major or minor number
-wrong filename
-file size grater than 500 bytes

D) If system auditing is running, the default /audit directory can rapidly fill up and require attention.

E)Check large files, use find
-->find / -xdev -size +1024 -ls | sort -r +6
find all files greater than 1 MB , sort them in reverse order with the longest files first.

F)Before removing any file, check to ensure a file is not currently in use
-->fuser filename
If a file is open at the time of removal , it is only removed from the directory listing. The blocks allocated to that file are not freed until the process holding the file open is killed.

2) Fix a full /var filesytem

check the following

a)--> find /var -xdev -size +2048 -ls | sort -r +6
Look for large files in /var
b)Check for obsolete or leftover files in /var/tmp

c)Check the size of the /var/adm/wtmp file,
which logs all logins, rlogins , & telnet sessions
The log will grow indefinately untill system accounting clears it out nightly.
-->cp /dev/null /var/adm/wtmp
To clear /var/adm/wtmp

To edit the /var/adm/wtmp file , first copy the file temporarily with the following command
-->/usr/sbin/acct/fwtmp < /var/adm/wtmp > /tmp/out
Edit the /tmp/out file to remove unwanted entries then replace the original file with the following command
-->/usr/sbin/acct/fwtmp -ic < /tmp/out > /var/adm/wtmp

d)Clear the error log in the /var/adm/ras directory using following prodedure. The error log is never cleared unless it is manually cleared
[Never use the cp /dev/null command to clear the error log . A zero length errlog file disables the error logging functions of the operating system and must be replaced from a backup]

clear the error log in the /var/adm/ras directory using the following procedure
a)Stop the error daemon
-->/usr/lib/errstop
b)Remove or move the errorlog file to a different file system
--> rm /var/adm/ras/errolog
or
-->mv /var/adm/ras/errlog filename(moved file)
C)Restart error daemon
-->/usr/lib/errdemon
D)Check /var/adm/ras/trcfile, if it is large and trace is not currently being run
-->rm /var/adm/ras/trcfile
e)If your dump device is set to hd6(default), there might be a number of vmcore* files in the /var/adm/ras directory . Remove these files if they are older.
f)Check the /var/spool/, which contains the queuing subsystem files
clear the queuing subsystem
-->stopsrc -s qdaemon
-->rm /var/spool/lpd/qdir/*
-->rm /var/spool/lpd/stat/*
-->rm /var/spool/qdaemon/*
-->startsrc -s qdaemon
g)Check /var/adm/acct/ which contains accounting records. If accounting is running ,this directory may contain several large files.
h)Check /var/preserve/ for terminated vi sessions. If a user wants to recover a session , you can use the
-->vi -r
to list all recoverable sessions.
To recover a specific session
--> vi -r filename
i)Check /var/adm/sulog file , which records the number of attempted uses of the su command and whether each was successful.(Recreates automaticaly)
j)Check /var/tmp/snmpd.log which records events from the snmpd daemon(Recreates automaticaly)
This file's size can be limited using /etc/snmpd.conf

3) Fix a full user defined filesystem
Fix a overflowing user defined filesystem
+--> find /fs -xdev -size +2048 -ls | sort -r +6
Check for files larger than 2MB

+Remove old backup files and core files.
-->find /\(-name "*.bak" -o -name \ "*.bak" -o -name ed.hup \) \ -atime +1 -mtime +1 -type f -print | xargs -e \ rm -f
This will remove old backup files,core files.
*.bak, a.out, core, * or ed.hup files

+To prevent files from regularly overflowing the disk,
--> skulker
as part of the cron process and remove files that are unnecessary or temporary.

+--> find /var -xdev -mtime 0 -ls
Locate files that have been changed in the last 24 hours.

4) Fix a damaged filesystem
Filesystems get corrupted when i-node or superblock information for the directory structure of the filesystem gets corrupted , due to hardware error or corrupted programs.

Symptom of corrupted fs
-System cannot locate, read ,write data located in the particular filesystem.

Solution.
1)Unmount the damaged filesystem
-->smit unmountfs (for a filesystem on a fixed disk drive)
-->smit unmntdsk(for a filesystem on a removable disk)
2)Assess filesystem damage by running fsck
-->fsck /dev/myfilelv (unmount first)
Checks and repairs inconsistent filesystems.
3)If filesystem cannot be repaired , restore it from backup

c)The system error log
+Error logging is automatically started by the rc.boot script during system initialization, and is automatically stopped by the shutdown script during shutdown.
The errdemon program starts the error logging daemon ,reads error records from the /dev/error file, and writes entries to the system error log. The default system errorlog is /var/adm/ras/errlog file.
The last entry is placed in NVRAM, and when system reboot starts, it is written in errorlog file.

-->/usr/lib/errdaemon
start at boot, but u can restart it in failure

-->/usr/lib/errstop (use carefully ,only in special cases)
Stops the error logging daemonn, disables diagnostic and recovery functions. The errorlog should never be stopped during normal operations

-->/usr/lib/errdemon -l
Determine the path to your system's errorlog file.

-->/usr/lib/errdemon/ -s 2000000
To change the maximum size of the error log file.

-->/usr/lib/errdemon -B 64000
Change the size of error log device driver's internal buffer.

*errpt command
To retrieve the entries in the error log
1)To display complete summary report of the errors that have been recorded , but it does not perform error log analysis.
-->errpt

To display all the errors which have an specific error ID
-->errpt -j 8527F64

To display all the errors logged in a specific period of time
-->errpt -s 1122160405 -e 1123160405 (MINUTE,DAY,HOUR,MONTH,YEAR)

*The errclear command
To delete entries from the errorlog
To delete all entries from the error log
-->errclear 0

*The errlogger command
This errlogger command allows you to log operator messages to the system error log.
The messages can be upto 1024 bytes in length
-->errlogger "This is a test of the errlogger command"
-->errpt
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
AA8AB241 1129134705 T 0 OPERATOR OPERATOR NOTIFIATION

Now to display the operator notification generated (id AA8AB341)
-->errpt -a -j AA8AB241
This is a test of the errlogger command.

*Extracting error records from a system dump

    The errdead command extracts error records from a system dump containing the internal buffer maintained by the /dev/error file.
The errdead command extracts the error records from the dump file and adds those error records directly to the error log.
[The error log daemon must not be running when the errdead command is run]
ex. To capture error log info from a dump image that resides in the /dev/hd7 file
-->/usr/lib/errdead /dev/hd7
*Redirecting syslog messages to error log
*Commands for manipulating error messages
errinstall
errupdate
errmsg
errupdate
ras_logger

D)The system log configuration
/etc/syslog.conf file controls the behaviour of the syslog daemon. syslogd uses /etc/syslog.conf
file to determine where to send the error messages or how to react to different system events.

-The /etc/syslog.pid file contains the process ID of the running syslogd daemon.

+Format of the configuration file /etc/syslog.conf
There are 3 parts -facilities (which application)
        -priorites(seriousness)
        -Destinations(send to whom)
Facilities -- kern-kernal
        user-user
        mail-mail
        daemon, auth, syslog, lpr, news ,uucp

Priorities -- Message priority
emerg,
alert-H/W errors -to all users
crit-improper login attempts
err-unsuccessfull disk write
warning-abnormal but recoverable
notice-important informational messages
info-important informational meassages
debug-"        "        "
none-"        "        "

Destinations--
file Name - Full path name of file opened in append mode.
Host - Host name, start by @

User-- Usernames
*=All users

+Using the system log
After customizing /etc/syslog.conf file
restart syslogd daemon
--> stopsrc -s syslogd
--> startsrc -s syslogd
few eg.
1)To log all mail facility messages at the debug level to the file /tmp/mailsyslog file
(facility)mail.(priority)degug (destination)/tmp/mailsyslog

2)To send all system messages except those from the mail facility to a host named barney
(facilities)*.debug;(facilities)mail.none

3)To send messages at the emerg priority level from all facilities and messages at the crit priority level and above from the mail and daemon facilities to users joey and ross
(faci)*.(prio)emerg;(faci)mail,(faci)daemon.(prio)crit (desti)joey,ross(destination)

4)To send all mail facility messagess to all users terminal screens
mail(faci).debug(prio) *(dest)

E)Performace tools overview
1) vmstat
Reports statistics about kernel threads, virtual memory, disks,traps,& cpu activity
Used to balance system load activity
-->vmstat
summary of the virtual memory activity since system startup

+Display five summaries at 1 second interval
-->vmstat 1(interval) 5(reports)
summary of the virtual memory activity since system startup

+Display the count of various events
-->vmstat -s

+To display 5 summaries for hdisk0 and hdisk1 at 2 seconds interval
-->vmstat hdisk0 hdisk1 2 5

+Number of forks,since system startup
-->vmstat -f

2)sar
Collects ,report,saves system activity information.
+To report current activity for each two seconds for the next five seconds
-->sar 2 5 (5 times for every 2 seconds)

+To report activity for the first two processors for each second for the next five seconds
-->sar -u -P 0,1(processors) 1(each second) 5(nxt 5 seconds)

3)topas

Vital statistics about the activity on the local system on a character terminal.
It extracts and displays statistics for a system with a default interval of two seconds
also
-Overall system statistics
-List of Busiest processes
-WLM statistics
The bos.per.tools and perfagent.tools (filesets must be installed on the system to run the topas

Parameters shown by topas
-cpu utilization,
usage,by user+systems
wait,idle
-Network interfaces
List of NIC, throughput, data received ,data transmitted
-Physical disks
list,Busy%, kBPS,TPS
-WLM classes
-Processes
Name,id, util-cpu, PS speed
-Events/queues
-File/TTY
-Paging
-Memory
-P.S.
-NFS
-->topas -P
busiest processes
-->topas -D
disk metrics
-->topas -i5 -n0 -p10
view top 10 processes in use while not displaying any network interface statistics, in 5 seconds intervals

*svmon-
Captures and analyzes a snapshot of virtual memory, current info of memory,memory leaking
-->svmon -P pid -i 1 3

*netstat
--> netstat i
verity status of all nic
-->netstat -in
MAC+IP
-->netstat -rn
Routing table
-->netstat -Cn
Display route costs if you have multiple routes having different costs to the same destination.
-->netstat -in
MTU size
-->netstat -m
kernel handles memory buffer for communication
purpose
-->netstat -v ent0|more
device driver info
-->netstat -s
statistics for all protocols icmp,udp,tcp,igmp,ip
-->netstat -p icmp/ip
about particular protocol
-->netstat -a (-an also)
for all sockets opended on your system

5)iostat
Report cpu statistics,aysnchronous i/o (AIO)statistics I/O for the entire system,adapters,TTY devices ,disks, CD-ROMs
Use iostat when
-Performance problems
-After H/W and S/W changes to the disk subsytem
-After change to attributes of vg,lv,fs
-After change to OS
-After change to Application

To determine if a physical disk has become a performance bottleneck
-->iostat -T -d 1 60
Monitors disk activity for 60 seconds
check %tm_act & kbps

To display more detailed statistics about disk, we artificialy create disk activity on hdisk0 and then created 10 performance reports every 2 seconds
-->dd if=/dev/hdisk0 of=/dev/null
-->iostat -D hdisk0 2 10

Display cpu utilization
-->iostat -T -t 1 60
Monitor cpu activity for 60 seconds

AIO utilization
-->iostat -A

List mounted fs
-->iostat -AQ

Adapter utilization
-->iostat -a 1 10 | more
-->iostat -a -D | more

6)Procmon tool(Graphical)
Allows you to view and manage the processes running on a system
Default refresh 5 seconds
Process- Priority,nice values,how long running ,how much cpu using,how much memory using,how much i/o a process performing,who has created
Must install below filesets
-bos.perf.gtools
-->./opt/perfwb/procmon/procmon/
-->tty
terminal number

To wait for a process to finish and display the status use procwait
-->find / -type f > /dev/null 2>&1
-->procwait

F)Tuning using the /etc/tunables files
/etc/tunables/nextboot - Applied at boot time
/etc/tunables/lastboot - Lastboot messages
/etc/tunables/lastboot.log- Lastboot messages

G)Documenting a system configuration
Listing device attributes
-->lsattr -El ent0
Display status location code for all disk devices
-->lsdev -Cc disk
Display characteristics ,capabilites of hot plug PCI slots
-->lsslot -c pci
Display system machine type,serial number
-->lscfg -vp | grep -ip cabinets
Display info about H/W ,S/W
-->prtconf

AIX

Monday, 26 May 2014

MONITORING & PERFORMANCE TUNING

No comments: