Friday, 12 December 2014

The boot process



Booting involves the following steps
1)Post (verify that the basic h/w is in functional state. Memory, keyboard,communication , audio devices ,storage are initialized)
2)Press a key and choose a boot device.
3)System Read only Storage (ROS) will locate & load bootstrap  code(s/w ROS)
4)S/W ROS (bootstrap  code) forms an IPL control block,which takes control and builds AIX specific boot information.
5)A special filesystem is located in memory which is RAMFS filesystem.
6)Software ROS is AIX information based on machine type and it is responsible for completing machine preparation to enable it to start the AIX kernel
7)A complete list of files that are part of the BLV are  in /usr/lib/boot directory.
-AIX kernel
-Boot commands called during the boot process,such as bootinfo & cfgmgr.
-Many devices need to be configured before hd4
is made available , so their corresponding methods have to be stored in the BLV, These devices are marked as base in PdDv.
-The rc.boot script(/sbin/rc.boot)
8)AIX kernel is loaded and takes control
LED-0299
The kernel will complete the boot process by configuring  devices and starting the init process.
9)So far the system has tested the hardware, found a BLV, created the RAMFS, & started the init process from the BLV.
The rootvg has not been activated yet.
From now on , the rc.boot script will be called three times , & is passed a different parameter each time.

*Boot Phase 1
During this phase , followinng steps are taken,
1)The init process started from RAMFS executes the boot script rc.boot1
c06-fails(if)

2)At this stage, the restbase command is called to copy a partial image of ODM from the BLV into the RAMFS.
510-successfull
548-Not successfull

3)At this stage -->cfgmgr -f command configures Base devices that are necessary to access rootvg.
[for ex.,if the rootvg is located on a hard disk,all devices starting from the mohterboard up to the disk will have to be initialized.]

4)At the end of boot phase1, the bootinfo -b command is called to determine the last boot device -511.

*Boot Phase 2
In boot phase 2, the rc.boot script is passed to the parameter 2.
During this phase following steps are taken.
1)rootvg volume group is varied on with the command
-->ipl_varyon
517-successfull
552-554-556 UNSUCCESSFULL

2)-->fsck -f
Verify whether hd4 filesystem unmounted cleanly before the last shutdown.
555-not successfull

3)Mount the root file system(/dev/hd4) is mounted on a temporary mount point (/mnt) is RAMFS
-->mount /dev/hd4 /mnt
557-unsuccessfull.

4)-->fsck -f /var

5)-->fsck -f /usr

6)Paging space /dev/hd6 activated

7)mergedev process copies all /dev files from the RAMFS to disk.

8)Customized ODM files from the RAMFS are copied to disk. Both ODM versions from hd4 & hd5 are synchronized.

9)root fs, var, usr mounted

10)There is no console available at this stage , so all boot messages will be copied to alog.
The alog command maintains and manages logs.
-->alog -o -f bosinst.log
-->smit alog_show

*Boot Phase 3
Now rootvg is activated and following steps are taken.
1)/etc/init process is started . It reads the /etc/inittab file and calls rc.boot with argument 3.

2)The /tmp file system is mounted.

3)-->syncvg
to synchronize rootvg
553-successfull

4)-->cfgmgr -p3 (if system is booted in service mode)
-->cfgmgr -p2 (if system is booted in normal mode)

5)-->cfgcon
To configure the console. After configuration of the console boot messages are sent  to the console.
All missed messages found in
-->/var/adm/ras/conslog

6)Synchronized of the ODM in the BLV with the ODM from the /(root) filesystem is done by the savebase command
-->savebase

7)syncd daemon & errdemon are started.

8) LED display is turned off

9)If /etc/nologin exists, it will be removed,

10)Execution of rc.boot has completed.

*System Initialization
During system startup, following sequence of events occurs.
1)init command is run as the last step of the startup process.

2)init command attempts to read the /etc/inittab file,
and try to locate an initdefault entry.
-if initdefault entry exists, the init command uses the specified run level as the initial system run level.
-if initdefault entry does not exist, the init command requests a run level from user.
-if user tyeps S,s,M,m runlevel, init command enters the maintenance run level. These runlevels do not require properly formatted /etc/inittab file.

3)If the /etc/inittab file does not exist, the init command places the system in the maintenace runlevel by default.

4)The init command rereads the /etc/inittab file every 60 seconds.

*/etc/inittab file
/etc/inittab control initialization process.
The /etc/inittab file is composed of entries that are position-dependent and have the following format:
Identifier:Runlevel:Action:Command(A shell command to execute)
Runlevel 0-halt
1-Single/Maintenace Mode
2-Multiuserr Mode
3 to 9 -Customized Mode

current runlevel
-->cat /etc/.init.state

Action: Tells the init command how to treat the process specified in the process field, The following actions are recognized by the init command.
--respawn- Restart always, start the process if it does not exist.
--wait-wait for the termination of process.
--once-when the init command enters a runlevel that matches the entry's runlevel, start the process, & do not wait for its termination, when it dies, do not restart it.
--boot-Process the entry only during system boot.
start it & then
dont care about it
let it run or die.

-->telinit -q
Rescans the inittab file to take effect the changes.

*Handling /etc/inittab file ,No flat file editing in AIX
-->mkitab
Adds records to the /etc/inittab file.

-->lsitab
Lists records in the /etc/inittab file.

-->chitab
Chages records in the /etc/inittab file.

-->rmitab
Removes records from the /etc/inittab file.
for ex. you want to add a record on the /etc/inittab file to run the find command in the runlevel 2 and start it again once it has finished.
ex.
1)Add a record named xcmd on the /etc/inittab using the mkitab command
-->mkitab "xcmd:2:respawn:find / -type f > /dev/null 2>&1"

2)lsitab xcmd
xcmd:2:respawn:find / -type f > /dev/null 2>&1

3)-->ps -ef| grep find
root 25462    -    --    -

4)-->kill 25462

5)--> ps -ef | grep find
ro-    -    -    -find / -type f
The process will continue respawning

-->rmitab xcmd
to delete xcmd record from /etc/inittab

-->cat /etc/inittab

*Order of the /etc/inittab entries-SRC has to be started near the begining of the /etc/inittab file since the SRC daemon is needed to start other processes. NFS requires TCP/IP daemons to run correctly, TCP/IP daemons started ahead of NFS daemons. Entries in /etc/inittab file are ordered according to dependencies , meaning that if a process(process2)
requires that another process(process 1) be present for it to operate normally , then an entry for process1 comes before an entry for process2 in the /etc/inittab file.
1)initdefault
2)sysinit
3)Powerfailure Detection(powerfail)
4)Multiuser check (rc)
5)/etc/Firstboot(fbcheck)
6)System Resource Controller (srcmstr)
7)start TCP/IP daemon
8)Start NFS daemons(rcnfs)
9)cron
10)pb cleanup
11)getty for the console(cons)

*How to recover from a non-responsive boot process
The bootlist command displays and alters the list of boot devices available to the system.
+Normal bootlist- The normal list shows the possible boot devices for the system when it is booted in normal mode.
+Service bootlist-List of possible devices when the system is booted in service mode.
+Previous boot device-Last device form which the sytem booted.
1)-->bootlist -m normal -o
Display a bootlist
-cd0, hdisk0 blv=hd5, rmt0

2)If you want to make changes to your normal bootlist and remove for ex. rmt0 use command
-->bootlist -m normal cd0 hdisk0

or
create a file containing the list of cd0 hdisk0(one device per line, or separeated by space)
-->bootlist -m normal -f filename

3)After changing the bootlist, verify the bootlist as follows
--> bootlist -m normal -o

Accessing a system that will not boot
If you are not able to boot the system, first you have to access it,
+Turn on all attached external devices, terminals, cd-rom, dvd-ram, tape drives,monitors,external disk drives before turning on the system unit . Turn on the system unit to allow the installation media to be loaded, means if system is not  booting then first connect to it ,access it ,turn on ,installation cd1 in cd drive, restart then start maintenance mode for system recovery-Access a root vg- access a volume group start a shell b4 mounting filesystem
then take appropriate measures to recover data or take action to enable the system to boot normally,

*Common boot LED codes & soltions
1)LED 201 -Damaged Boot image
a) Access your rootvg from the maintenance mode
b) Check /, & /tmp filesystem size
c)Determine the boot disk by using
-->lslv -m hd5
d)Recreate the boot image using
-->bosboot -a -d /dev/hdiskn
where n is the number of the disk containing the BLV
e)Check for CHECKSTOP errors in errorlog.
These error indicates failing hardware
f)restart

2)LED 223-229 Invalid bootlist
if system is not booting then to do login go in maintenacne mode

3)LED 551,555,&557 Corrupted file system,corrupted jfs log.
a)Maintenance Mode-choice 2- Access the rootvg before mounting any filesystem
b)verify & correct filesystems (first unmount & then check)
-->fsck -y /dev/hd1
-->fsck -y /dev/hd2,3,4,9
c)Format the jfs log  again using
-->/usr/sbin/logform /dev/hd8
d)Use
-->lslv -m hd5
disk on which hd5 is present
e)Recreate the boot image
-->bosboot -a -d /dev/hdiskn

4)LED 552,554,556
Superblock corrupted & corrupted customized ODM database
a)Maintenance mode
b)check all filesystem
c)If fsck indicates that block 8 is corrupted, the superblock for the filesystem is corrupted and needs to be repaired
-->dd count=1 bs=4k skip=31 seek=1 if=/dev/hdn of=/dev/hdn
d)Rebuild JFS log
-->/usr/sbin/logform /dev/hd8
If this solves the problem then stop here,otherwise continue with below steps
e)Your ODM database is corrupted. Restart system, go to maintenance mode, select choice 2 (access rootvg before mounting  any filesystem)
f)Mount the root & user fs
-->mount /dev/hd4 /mnt
-->mount /usr
g)copy the system configuration to a backup directory
-->mkdir /mnt/etc/objrepos/backup
-->cp /mnt/etc/objrepos/Cu* /mnt/etc/objrepos/backup
h)copy the configuration from the RAM file system
-->cp /etc/objrepos/Cu* /mnt/etc/objrepos
-->umount all
j)-->lslv -m hd5
k)save the clean ODM to the boot logical volume by using
-->savebase -d /dev/hdiskn(disk containing BLV)
l)reboot
m)if problem persistes , reinstall the BOS

5)Filesystem issue- df ,/var,/tmp, /usr
hd5(BLV) bootlist -m normal -o
bosboot -a -d /dev/hdiskn
lslv -m hd5
fsck -y /dev/hd1,2,3,4,9
bootissue- bootlist,bosboot, lslv -m hd5
/etc/inittab--> df /,/var,/usr, /tmp

6)LED 553 Corrupted /etc/inittab file
a)Go to maint. mode
b)check space in /, /var,/tmp using df
c)check /etc/inittab file correct any problems
d)check problems with
/etc/environment file basic environment for processes.
/bin/sh
/bin/bsh
/etc/fsck
/etc/profile
/.profile
e)restart

*Runlevel
-->cat /etc/.init.state

*Displaing a history of previous runlevels
1)Log in as the root user
-->/usr/lib/acct/fwtmp < /var/adm/wtmp | grep run-level

*Changing system runlevels
-->lsitab init
to check the default runlevel
To change the run level check first that getty is enabled at all runlevels.
because getty enables all the terminal access possible.

Use the wall command to inform users
-->wall message

-->telinit M
change runlevel to maintenance level

*telinit command
telinit directs the action of init process , it sets the system at  a specific runlevel.
-->telinit 0-9/S,s,m,M/a,b,c/Q,q/N
-->telinit M
To enter in maintenance mode
-->shutdown -m

*An introduction to the rc* files
1)rc.boot script called by init, it controls the machine boot process. It varyon rootvg, enable filesystems
-->/sbin/rc.boot file

2)/etc/rc file varyon all vgs marked as auto-varyon. Activate all paging spaces listed on /etc/swapspaces
Configure all dump devices
perform filesystem checks
mount filesystem
installation specific

3)/etc/rc.net
file that contains network configuration information . It allows you to enable the network interfaces & see the hostname,default gateway, static routes for the current host.
This script also be run to configure new devices that have been added to the system since boot time.

4)/etc/rc.tcpip
Automatically executed, uses SRC commands to initialize selected daemons. Most daemons that can be initialized by rc.tcpip are specific to TCP/IP
-inetd (started by default)
-gated
-routed
-named
-timed
-rwhod
*Dont run gated & routed at the same time on a host
other daemons specific to bos or other appliation that can be started through rc.tcpip
-lpd
-portmap
-sendmail
-syslogd(started by default)

-->stopsrc -s daemon
-->startsrc -s daemon


Friday, 30 May 2014

Some things to consider before changing settings

Some things to take notice

1)-->/usr/lib/errstop (use carefully ,only in special cases)
Stops the error logging daemonn, disables diagnostic and recovery functions. The errorlog should never be stopped during normal operations.

2)The error log daemon must not be running when the errdead command is run

3)The bos.per.tools and perfagent.tools (filesets must be installed on the system to run
the topas.

4)Procmon tool(Graphical)
Must install below fileset
-bos.perf.gtools

5)The error log is never cleared unless it is manually cleared.
[Never use the cp /dev/null command to clear the error log . A zero length error log file
disables the error logging functions of the operating system and must be replaced from a
backup]


6)The command 'traceroute' (it creates load on system so don't use it on production server)


7)Don't run gated & routed at the same time on a host


8)ODM commands
When no option left then only then try odm commands
if used falsely the system might crash. So better wait for IBM SUPPORT.

9)Do not restart TCP/IP daemons, using the command
-->startsrc -g tcpip
It will start all subsystems defined in the ODM for the tcpip group, which includes both
routed & gated.

10)Do not run fsck command on mounted filesystem.

Monday, 26 May 2014

Daily Management

Daily Management

A)User Administration

/etc/passwd
/etc/security/passwd
Attribute -characteristic of a user or a group that defines the type of function that a user of group can perform . These can be extraordinary privilegest, restrictions, processing environments assigned to a user
-Access rights
-environment
-Authentication
-Account access

Files
1)/etc/security/environ-Environment attributes for users
2)/etc/security/lastlog-Last login attributes for users
3)/etc/security/limits-Process resource limits for users
4)/etc/security/user-Extended attributes for users
5)/usr/lib/security/mkuser.default-Default attributes for new users.
6)/usr/lib/security/mkuser.sys-Customize new user accounts.
7)/etc/passwd-BAsic attributes of user
8)/etc/security/passwd-Password information
9)/etc/security/login.cfg-System default login parameters
10)/etc/utmp-Records of users logged into the system.
11)/var/adm/wtmp-Connect time accounting records.
12)/etc/security/failedlogin-Records all failed login attempts.
13)/etc/motd-Message to be displayed, every time a user logs in to the system.
14)/etc/environment-Basic environment settings for all users.
15)/etc/profile-Additional Environment settings  for all users
16)$HOME/.profile-Environment settings for a specific user
17)/etc/group-Basic attribution of groups
18)/etc/security/group-Extended attributes of groups

+ /etc/passwd
Name:password:USERID:PrincipleGroup:GECOS:HOME:SHELL
*-incorrect passwd
!-Normal passwd is in /etc/security/passwd file

+/etc/utmp
-->who -a

+/etc/profile
First file that the os uses at login time contains -umask, mail ,tty

+$HOME/.profile-2nd file os uses at login time
-shells to  open
-Envir variables
-Default editor
-Prompt appearence

B)User Administration tasks

1)Adding a new  user account
-To create the smith account with smith as an administrator
-->mksuser -a smith
Create user account smith, with default values in /usr/lib/security/mkuser.default

-->mkuser smith
-->smitty mkuser

2)-->passwd
Change your passwd
-->smitty passwd

3)Do not use chuser if you have NIS
-To change the expiration date for the smith to 8a.m. 1 Dec. 1998
-->chuser -a expires=1201080098 smith(Month,Day,Hour,Min,Year)

To add smith to the group program
-->chuser groups=program smith
-smitty chuser

4)lsuser, smitty users
-->lsuser smith
displays all attributes of user smith in default format.
-Display all attributes of all users
-->lsuser ALL

5)Removing a user account,
-->smitty rmuser
-Remove smith
-->rmuser smith
-Remove smith ,all attributes ,passwd,authentication
--rmuser -p smith

6)-->who
-->whoami
-->who -r(runlevel)
-Display any active process that was spawned by init
-->who -p

7)/etc/nologin
if it exists the system  accepts the user's name & password but prevents the user from logging

8)-->chsh
change user's login shell attribute.

9)/etc/security/limits-Specify the process resource limits for each user
default/prashant:
fsize=2097151
core=2097151 largest core file a user's process can create
cpu=-1 max number of seconds of system time that a user's process can use(-1 is turnoff restrictions)
data=262144
rss=65536<largest physical memory user's process can allocate
stack=65536
nofiles=2000 Max number of files a user's process can have open at one time

10)/etc/security/environ-Environment attributes for user.
mksuser creates a user stanza in this file .
Initialization of attributes depends upon their values in the /usr/lib/security/mkuser.default file.
chuser - to change attributes
lsuser- display attributes
rmuser-removes entire record for a user
ex.
-->pg /etc/security/environ
default:
root:
daemon:
bin:
sys:
adm:
uucp:
guest:

11)/usr/lib/security/mkuser/default
Contains the default attributes for new users.
This file have the default values of the attributes for the users created by mkuser command
-->pg /usr/lib/security/mkuser.default
user:
pgrp=staff
groups=staff
shell=/usr/bin/ksh
home=/home/$USER
admin:
pgrp = system
groups= system
shell = /usr/bin/ksh
home=/home/$USER

12)/etc/security/lastlog- Last login attributes for users.
username:
time_last_login=1134081482 (number of seconds since the last successful login)
tty_last_login=/dev/pts/6 Terminal on which the user last logged in.
(last logged host)host_last_login_count=0 The number of unsuccessful login attempts since the last successful login.
This attribute works with the user's login retries attribute,
specified in the /etc/security file , to lock the user's account after a specified number of consecutive unsuccessful  login attempts.
-->chsec -f /etc/security/lastlog -s username -a login_count=0

13)/etc/security/user -extended attributes for user
mkuser creates a stanza in this file for each new user and initializes its attributes  with the default attributes defined in the /usr/lib/security/mkuser.default file
Also this file contains many attributes that allow you to control how users must manage their passwords, such as histsize,histexpire,
maxage-Maximum age(in weeks)of a password
maxexpired,maxrepeats etc.

14)/usr/lib/security/mkuser.sys
shell script that customizes a new user account.
Creates homedir, primary group, profile, for user's shell.

15)/etc/passwd
Basic user attributes
Name:Password:UserID:PrincipleGroup:Gecos:HomeDirectory:Shell

16)/etc/security/passwd-Contains passwd information.
A user who has an invalid password (*) in the /etc/passwd file  will have no entry in the /etc/security/passwd file
ex. root:
    passwd=CHbXMXLTUO1
    lastupdate=1134028556
    flags=
17)/etc/security/login.cfg
System default login parameters,
configuration information for login and user authentication
default:
sak_enabled=false
logintimes=
logindisable=0
logininterval=0
loginreenable=0
logindelay=0
usw:
shells=/bin/sh, /bin/bsh, /bin/csh
maxlogins=32767
logintimeout=60
auth_type=STD_AUTH

18)/etc/utmp
Record of users logged into the system
-->who -a
Processes this file, if this file is missing or corrupted , no output is generated from the who command.
-->/var/adm/wtmp
conncect time accounting records

19)/etc/security/failedlogin-All failed login attempts
-To change the /dev/tty0 port to automatically lock if five unsuccessful login attempts occur within 60 seconds,
-->chsec -f /etc/security/login.cfg -s /dev/tty0
-a logindisable=5 -a logininterval=60
s-name of the stanza to modify
f-name of the stanza file to modify
-To unlock the /dev/tty
-->chsec -f /etc/security/portlog -s /dev/tty0 -a locktime=0
-To allow logins from 8.00 am, until 5.00 pm for all users
-->chsec -f /etc/security/user -s default -a logintimes=0800-1700
-PS1 Primary prompt
-->echo "$PS1"
-Change prompt
-->export PS1="root@'hostname'#"

*mkgroup
-Create a new  group account called managers and set yourself as the administrator
-->mkgroup -A managers
-Create a new group account called managers & set the list  of administrators to steve & mike
-->mkgroup adms=steve, mike managers

*chgroup -->smit chgroup(dont use if you have NIS)
-Changes attributes for group
-To add sam &carol to the finance group , which currently only has frank  as a member
-->chgroup users=sam,carol,frank finance
-->chgroup users=u1,u2,u3, dbm
-To remove frank from finance group, but  retain sam and carol ,and remove the administrators of the finance group
-->chgroup users=sam,carol adms=finance

*chgrpmem : Changes the administrators of members of a group
-To remove joey as an administrator of the friends group
-->chgrpmem -a -joey friends
-To add members rachel & phoeby to group friends
-->chgrpmem -m + rachel, pheby friends
-To list members and administrators of group friends
-->chgrpmem friends

acl examples
attributes :SUID
base permissions:
owner (frank):rw_
group (system):r_x
others:_ _ _
extended permissions:
enabled
permit rw_ u:dhs
deny r_ _ i:chas ,g:system(user chas has not any access until he is a memeber of group chas)
specify r_ _ ,:john, g:gateway, g:mail(untill john is a member of gateway and mail group he has the read access)
permit rw_ g:account, g:finance
-->aclget filename
-Change the shell to /usr/bin/ksh for user prashant
-->chsh prashant /usr/lib/ksh
-To enable user smith to access this system remotely
-->chuser rlogin = true smith

C)Common login errors
1)3004-004 : You attempted to logout, when processes are still running
2)3004-007: Invalid login name or password
3)3004-008:Failed credentials
4)3004-009:Damaged login shell
5)3004-030:Caps lock on
6)3004-302:Account has expired
7)3004-687:User does not exist

D)Monitoring & Managing processes
-)Display all processes
--> ps -e -f
-)Display processes owned by ross, joey, chandler
-->ps -f -l -ross, joey, chandler
-)Display info about all processes & kernel threads
-->ps -emo THREAD
-)list all 64 bit processes
--> ps -M
-)kill
-->kill PID
-->kill -kill 2098 1048
kill processes
-->kill -kill 0
To stop all of your processes and log yourself off
-)To stop all the processes you own
-->kill -9 -1

+ nice & renice
-nice ,runs another command at a different priority,
-renice, changes the priority of an already running process
-nice 0(highest) to 39(lowest)
-renice -20(highest) to 20 (lowest)
-->renice -n 5 p 98732
ProcessID- 987,32 should have lower scheduling priorities
-->renice -n -4 -9 324 25
324 & 25 have higher scheduling priorities
+ fuser
-To list the process numbers and user login names of processes using  the /etc/filesystems
-->fuser -u /etc/filesystems
-To terminate all of the processes using a given filesystem
-->fuser -k -x -u -c /dev/hd1
or
-->fuser -kxuc /home
You might want to use this command if you are trying to unmount the /dev/hd1 filesystem and a process that  is accessing the /dev/hd1 filesystem prevent this.

-)To list all processes that are using a file that has been deleted from a given filesystem
-->fuser -d /usr
what is still active in the filesystem

-)To return the processID, for all processes that have
open references within a specified filesystem
-->fuser -xc /tmp
fuser will show only user processes and not system of kernel process
-->find /home -type d -exec fuser -u {} \ ;

E)File and directory permissions and ownership

+Access Control lists
The major task in administering access control is to define the group memeberships of users, because these memeberships determine the users access rights to the files that they do not own.
With ACL permissions, you can permit or deny file access to specific individuals or groups without changing the base permissions
+Base Permissions
-owner group others
-r,w,x

+Attributes
setuid(SUID):IF owner set suid bit for the file then only it will give permission of execution to everybody, if owner doesn't have the suid(execution)permission, then nobody will able to execute it.

+suid only related to executed -x permission
+small 's' execute permission is there
+Big 'S' execute permission is not there
+suid set only to files
-->chmod ug+s filename
+setgid(SGID)

*Extended Permissions
ex. of ACL
attributes : SUID
base permission :
owner(frank):rw_
group(system):r_x
others:_ _ _

extended permissions: optional
enabled extended permissions enable
permit rw_ u:dhs
deny r_ _ u:chas, g:system
specify r_ _ u:john, g:gateway, g:mail
permit rw_ g:account, g:finance
-)To display the access control  information for the status file
-->aclget status

*Setting Access Control Information(aclput)
-)To set  the access control information for the status file with the access control information stored in the acldefs file
-->aclput -i acldefs status
2. To set the access control  information for the status file with the same information used for  the plans file
-->aclget plans | aclput status

*acledit
-)To edit acl info of plans file
-->acledit plans

*chmod
Modifies the mode bits and the extended access control lists (ACLs) of the specified files or directories.
+Permission for directories
r-list
w-create,delete
x-cd

-->chmod go -w+x mydir

-->chmod u=rwx, go=r_ _ filename
user has all permissions, group & others denied in all way.
-)To recursively descend diretories & change file & directory permissions given the tree structure
-->chmod -R  777 f*

*chown
changes the owner of the file
-) How to change the owner of the file program.c
-->chown prashant program.c
-)change the owner & group of all files in the directory /tmp/src to owner john & group build
-->chown -R john:build /tmp/src/

*chgrp
changes the group associated with the specified file or directory
-)Changes the group ownership of the file or directory
named test to production
-->chgrp production test
(copy group setting of productin on group test)
-)Change the group ownership of the directory named production, and of all the files and subdirectories under it to test
--> chgrp -R test production
copy the group settings of test onto group production

*Cron & crontab
--> crontab -l
lists the contents in /var/spool/cron/crontabs directory
-crontab 0, 15,30,45 8-17 * * 1-5 /home/script1
To execute a command called script1 every 15min between 8AM and 5PM , Monday through friday
-->crontab -e
To create and update  the crontab file.
The crontab command invokes the editor.
-->crontab -v
To check the crontab submission time
-->crontab -r prashant
Removes the /var/spool/cron/crontabs/prashant file

+crontab files are kept in /var/spool/cron/crontabs/
Each cron user has a crontab file with their username
as the filename in this dir.

+crontab
minute, hour, day-of-month, month, day-of-week command
+If the cron.allow file exists, only uesrs whose login names appear in it  can use the crontab command.
The root user name must appear in the cron.allow
file , if the file exists.
If only the cron.deny file exists, any user whose name does not appear in the file can use the crontab command
+A user cannot use the crontab command if one of the following is true
-cron.allow file and cron.deny file donot exists
-cron.allow file exists but the user's login name is not listed in it.
-cron.deny file exists & the user's login name is listed in it.
-->cat > /var/adm/cron/cron.allow
root
deploy

*-->crontab -e
edit
-->crontab -v
check crontab submission time

*Removing crontab file
Avoid running crontab -r when you are logged in as root. IT removes the /var/spool/cron/crontabs/root
file.
-->crontab -r
Do not run it  as root
-->mail denise < letter1
send the file letter1 as a message  to user denise
-->echo $PATH > path (output of command directed)
-->cat path

-->cat file1
line1
-->cat file2
line2
-->cat file2>>file1
-->cat file1
line1
line2
+-->cat >test
test
ctrl+D
-->cat test
TEst

-)chsec command changes the attributes stored in the  security configuration stanza files  ,
-)to Display current environmental variable
-->setsenv
-)To set the file size limit to 100KB
-->ulimit -f 100
sets or reportss user resource limits as defined in the /etc/security/limits



Backup & Recovery

Backup & Recovery

1)mksysb
mksysb command creates a bootable image of all mounted file systems on the rootvg volume group. You can use this backup command to restore a system to its original state.
User defined paging spaces,unmounted file systems, raw devices are not backed up.

+Data layout of a mksysb tape
BOSBOOT IMAGE    mkinsttape image dummy.toc rootvg-data
-BOSBOOT image=kernel,device drivers needed to boot from mksysb tape. created by bosboot.
-mkinsttape image=
./tapeblksz=block size the tape drive was set to when mksysb command was run.
 ./image.data=image installed during  the BOS installation process ,includes sizes,name,maps,mount points of logical volumes and filesystems in rootvg, u can customize it before running mksysb or run
-->mksysb -i
to generate a new ./image.data file on tape
during backup
-->mkszfile
generates the ./image.data file.

+dummy toc is used so that the mksysb tape contains the same number of images as as BOS install tape.
>Excluding file systems from a backup
-->cat /etc/exclude.rootvg
^./tmp/
Then run
-->mksysb -e  /dev/rmt0
-e exclude the contents of exclude.rootvg

>How to create a bootable system backup
-->smitty mksysb
you cannot run the mksysb command against a uservg,use savevg, tar,cpio, backup commands to backup uservg.

>List content of a mksysb image
To verify the content of an mksysb image
-->smitty lsmksysb

>Restore a mksysb image
mksysb image  enables you to restore the system image onto target systems that might not contain the same  hardware devices or adapters, require the same kernel (uniprocessor of microprocessor)
or be the same hardware platform as the source sytem.
You have several  possibilities to restore the mksysb image.
1)If restoring on exactly the same machine , you can boot directly from the mksysb media and restore from there
2)If restoring on a different type of machine,  use cloning function -->smit alt_clone
3)If you do not want to interfere with the production environment,use alternate disk install using mksysb
4)If you want to restore only several files from the mksysb image
-->smitty restmksysb
use the (.)dot before the filename ,ex ./etc/hosts

>tctl command
-->tctl -f /dev/rmt0 rewind

C)Backup Strategies
3 types of backup methods
1-Full backup
2-Differential backup
3-Incremental backup

1.Full backup
2.Differential backup
Only modified files are backed up, but only if they changed after the latest full backup.These are cumulative ,once a file has been modified it will be included in every differential backup untill next full bakcup.
Advantages-To restore ,the latest full backup and only the latest differential backup media sets are needed.
-Backup window is smaller than a full backup.
Disadvantage
If data changes a lot between full backups then number of differential backups increased very much.

3.Incremental backup
Also back up modified files only, however incremental backup checks the difference between the modification time of a file and the last  backup time (either being full or incremental backup). IF the modification date is more recent  than the last backup date,the file is backed up.
Advantages
-Backup window is smaller than a full backup
-Only the difference from a previous backup will be written on media.
Disadvantages
-To restore , the latest full backup and all the subsequent incremental backup media sets following that full backup are needed.
-To restore a single file, tape handling operations are intensive
-A damaged or lost media in the incremental set can mean disaster. The modification of those files on that media may be lost  forever.

D)Related backup and restore  commands
1)savevg -->smit  savevg
To backup uservg
The savevg command uses a data file created by the mkvgdata.
-->/tmp/vgdata/vgname/vgname.data
This vgname.data file contains information about a userr vg. The savevg command uses this file to create a backup image that can be used by the restvg when it restores the vg.
-->savevg  -e /dev/rmt0 datavg
-e exclude files specified in /etc/exclude.vgname file
-u updates the /etc/dumpdates file with raw device name of filesystem , & the time date and level of the backup. You must specify -u flag if you are making incremental backups

2)restvg: --> smit restvg
Restores the uservg and all its containers and files
-->restvg -f /dev/rmt0 hdiskn

3)Backup
backup files and filesystems.
-To backup all the files and subdirectories in the /home directory  using full path names
-->find /home -print | backup -i -f /dev/rmt0
Because the files are archived using full path names, they will be written to the same paths when restored.
-To backup the /(root) filesystem
-->backup -0 -u -f /dev/rmt0 /
-0 zerolevel specifies that all files in /(root) filesystem be backed up.
-u update the /etc/dumpdates file
-To backup all the files in the /(root)
filesystem that have been modified since the last level 0 backup
-->backup -1 -u -f /dev/rmt0 /

4)restore
Extracts files from archives created with the backup command
To exclude data that you do not want to restore from a specific path, use find and pring and send result to the restore.
To restore an entire filesystem archive
-->restore -rvqf /dev/rmt0
-f device
-a medium is ready
-v verbose
Restores the entire filesystem archived on the tape device /dev/rmt0, into the current directory.
To restore a specific directory and the contents of that directory from a archive
-->restore -xdvqf /dev/rmt0 /home/mike/tools
-x extract files by their name
-d extract all files & subdirectories in the /home/mike/tools/ directory.
-->restore -d /vg-backup/latest-backup(file) hdisk2

5)tar
-->tar -cvf /dev/rmt0 /home
-->tar -tvf /dev/rmt0
list contents of file
-->tar -xvf /dev/rmt0
extract in current directory

6)cpio
To copy files in the current directory onto diskette
-->cpio -ov > /dev/fd0
This copies all the files in the current directory whose names end with .c
To copy the current directory and all subdirectories onto diskette
-->find .-print | cpio -ov > /dev/fd0
Saves the directory tree that starts with the current directory(.) and includes all of its subdirectories and files
-o Read from standard input & copies to standard output
-v List filenames

7)pax
-->pax -wf /dev/rmt0
copy the contents of current directory to the tape drive --> pax -rw file1 /tmp
copy file1 to /tmp

8)gzip -c file
compress file

9)gzip -d file.gz
decompress

10)tcopy
-->tcopy /dev/rmt0 /dev/rmt1
-->tcopy /dev/rmt0
Layout of the mksysb image

E)Verify the content of a backup media
It is a good practice to verify the readability to eliminate trouble at recovery time, to avoid tape incomatibilities  ,damaged media or missing  files.
If backup media has difficulties while reading the tape, check below steps
1.Media is not damaged , try another media.
2.Verify that you have latest drivers installed for your backup device
3.Check that the backup device is turned on
4.Try the media on another server
5.Change the block_size parameter or the tape streamer to 0(auto detect)






Problem Determination and Resolution

 Problem Determination and Resolution

1)ping
-->ping
determine the status of the network and various remote  foreign hosts
-Tracking and isolating H/W & S/W problems
-Testing, measuring & managing networks

+Display the route buffer on the returned packets
-->ping -R server2

If you cannot reach other computers on the same subnetwork with the ping, look for problems on your system's network configuration ,use arp & ifconfig.

2)arp
Display and modifies the internet to physical address(MAC address)translation tables used by ARP. The arp command displays the current ARP entry for the host specified by the Hostname variable.
Modifies MAC table used by the ARP(Address Resolution Protocol)
-->ping 9.3.5.193
No response
-->ping 9.3.5.196
Response
-->arp -a | grep 9.3.5.19
9.3.5.193=No MAC
9.3.5.196=MAC - 0:2:55:A8:00:dd
check cable connections, H/W

3)ifconfig
-->ifconfig -a -d
Show only those interfaces that are down.
If a interface is down and you have problem in reaching the subnet on which the interface is configured, run

-->errpt
to  check any errors has been reported for the interface (for ex. duplicate IP address in the network)

-->diag
Diagnostic over the interface
If the interfaes do not have problems ,then they are in active state, and your system cannot reach to the computers on same subnetwork , you should check that the interfaces subnet mask is correct.
Suppose to change the subnet mask to 255.255.255.252 for en1 interface
-->ifconfig en1 netmask 255.255.255.252 up

4)traceroute(it creates load on system so dont use on production server)
-->traceroute
trace the route of an IP packet,network testing , measurement,management ,Primarily for manual fault isolation.

A-2) H/W problems
1)errpt 
Generates an error report from entries in an errorlog, but it  does not perform error log analysis .
for analysis use
-->diag
-->errpt -a

+class -General source of  the error
H-H/W
S-S/W
O-informational messages
U-Undermined

+Type - Severity of the error that has occured.
PEND-The loss of availability of a device or component is imminent
PERF-The performance of the device has degraded to below and acceptable level.
PERM-A condition that could not be recovered from.
-severe errors, defective H/W,S/W module
TEMP-A condition that was recovered from after a number of unusual attempts.
UNKN-not possible to determine the severity of the error.
INFO-Error log entry is informational and was not the result of an error.

+Resource Name-Name of the resource that has detected the error.
Location code-Path to the device,Drawer,slot,connector
,port

2)diag
Diag uses the errorlog to diagnose H/W problems.
System delets -H/W entries 90 days older
        -S/W entries 30 days older
-->diag
Diagnostic Routines - System Verification - Problem Determination.

B)Reasons to monitor root mail
1)mail
-->mail
Most of the processes send a mail to the root account with detailed information
-->diagela
Diagnostic Automatic Error log Analysis
provides the capability to do error log analysis whenever a permanant H/W error is logged.
It sends a message to your console and to all system groups. The message contains SRN or a corrective action, diagela is enabled by default at BOS installation time.

2)crontab
sends mail to root

3)Other software packages,especially security related ones,have the ability to specify the administrator.
ex. incase of security breach, illegal file permission change, or unauthorized passwd-file access , the system administrator receives a message.

C)System dump facility
System generates a system dump when a severe error occurs. System dumps can also be user-initiated  by users with root user authority.
A dump creates a picture of your system's memory contents. Sysadmins and programmers can generate  a dump & analyze its contents when debugging new applications.

a)Configuring a dump device
At the installation time, dump device(/dev/hd6 bydefault, primary) created . Secondary dump device /dev/sysdumpnull.
If your system has 4GB or more of memory then the default dump device is /dev/lg-dumplv & is a dedicated dump device.
A primary dump device is a dedicated dump device, secondary dump device is shared dump device
The dump device can be configured to either tape or a  logical volume on the hard disk to store the system dump.
+To list the current dump destination
-->sysdumpdev -l
+Change primary dump device from /dev/hd6 to logical volume /dev/dumpdev
-->sysdumpdev -P -p /dev/dumpdev

+Info  about previous dump
-->sysdumpdev -L

+Minimum size for the dump space can be determined by
-->sysdumpdev -e

+increase size of dump device
-->extendlv

1+ Start a system dump
Dump can be system initiated or user initiaed . If your system stops with 888 number flashing in the operator panel display , the system has generated a dump and saved it to a primary dump device

2+Understanding 888 error messages
It means either  a H/W or S/W problem has been detected  and a  diagnostic message is ready to be read.
Record info contained in the 888 sequence message,
-888
-102-unexpected system halt
-mmm-cause of halt-crash code h/w,s/w
-ddd-Dump Status-Dump code
-888
when the system dump completes,the system either halts or reboots , depending upon the setting of the auto restart attribute of sys0
-->lsattr -El sys0 -a autorestart
if autorestart true  ,Automatically REBOOT system  after a crash is True
Change this setting
-->chdev -l sys0 -a autorestart =false
sys0 changed
-->lsattr -El sys0 -a  autorestart

+ User initiated dump
-->sysdumpstart -p
write dump to the primary device
-->sysdumpstart -s
to secondary dump device

3+Copy a system dump
-->pax
allow you to copy,creat and modify files that are greater than 2 GB in size such as system dumps from one location to another. This is useful in migrating dumps, as the tar & cpio commands cannot handle manipulating files that are larger than 2GB in size. pax can also view and modify files in the tar and cpio format.
To view the contents of the tar file /tmp/test.rar
-->pax -vf /tmp/test.tar
To create a pax command archive on tape that contains two files
-->pax -x pax -wvf /dev/rmt0 /var/adm/ras/cfglog /var/adm/ras/nimlog

To untar the tar file /tmp/test.tar
to the current directory
-->pax -rvf /tmp/test.tar

To copy the file run.pax to the /tmp directory
-->pax -rw run.pax /tmp

4+snap
Used to gather configuration information of the system. It is a method of sending lslpp & errpt output to your service center, for diagnosing problems.
Default directory for the output from the snap command
-->/tmp/ibmsupt
8MB of temporary disk space is required when executing snap.
To copy general system information ,including file system, kernel parameters and dump information to rmt0
-->/usr/sbin/snap -gfkD -o /dev/rmt0
also copy atest case of problem in /tmp/ibmsupt directory

5+ Analysing system dumps
kdb -allows you to examine a system dump or running kernel

D) alog
--/var/adm/ras/bootlog
Boot log contains info generated by cfgmgr & rc.boot
To change the size of the boot log
--> echo " boot log resizing " | alog -t boot -s 8192

Display the bootlog
--> alog -t boot  -o | more

E) Determine Appropriate actions for user problems -commands
1)usrck
Verifies the correctness  of the user definitions  in the userdatabase file, by checking the definitions for ALL the users or for the users specified by the user parameter.
This command checks
1>/etc/passwd
entries , duplicate names are reported and removed. Duplicate IDs are reported but not fixed .
If entry has fewer than six colon separeted fields entry is reported.
2>/etc/passwd - /etc/security/user, /etc/security/limits.
usrck verifies that each user name listed in the /etc/passwd file has a stanza in the /etc/security/user, Also verifies that each group name listed in /etc/group has stanza in /etc/security/group file.

To verify that all the users exist in the user database, and have any errors reported (but not fixed)
-->usrck -n ALL

To delete, from the user definitions ,those users who are not in the user database files,& have any errors reported
-->usrck -y ALL(-y fix & reports errors)

2) grpck
Verifies the correctness of the group definitions in the user database files by checking the definitions for all the groups or for the groups specified by the Group parameter.
To verify that all the group members and admins exist in the user database ,and have any errors reported (but not fixed)
-->grpck -n ALL

To verify that all group members and admins exist  in the user database & to have errors fixed, but not reported
-->grpck -p ALL

To verify the uniqueness of the group name & groupID defined for the abc group
-->grpck -n

Only report and not correct
-->grpck -t abc

Ask interactively
-->grpck -y abc
fixes errors and reports them.

3)pwdck
Verifies the correctness of passwd info .
verify that all local users have valid passwords
-->pwdck -y ALL
This report errors, & fixes  them.

Ensure that user joey has a valid stanza in /etc/security/passwd
-->pwdck -y joey
fixes errors and reports them

4)sysck
Checks file definitions against the extracted files from the installation and update media and updated the SWVPD.
Used during installation & update of s/w products.
sysck updates the filename,product name,type ,checksum,size of each file in SWVPD database.
A product that uses the installp  command to install has an inventory file in  its image.
To add the definitions to the inventory database and check permission ,links,checksums.
-->sysck -i -f smart.rte.inventory smart.rte

To remove any links to files for a product that has been removed from the system and remove the files from the inventory database
-->sysck -u -f smart.rte.inventory smart.rte

5)lsgroup & lsuser
-->lsgroup -f ALL >> /tmp/check
-->lsuser -f ALL >> /tmp/check
write output in file /tmp/check
-->lsuser joey
used by root for a specific user

6)The user limits
/etc/security/limits file specifies the process resourlce limits for each user.
-->mkuser
-->chuser
-->lsuser
-->rmuser

F)Identifying H/W problems
a)Replacing hot plug devices
-->lsslot -c pci
Display the number ,location ,and capabities of hot plug pci slots.
Before replacing a hot  plug adapter or disk, you should unconfigure all other devices or interfaces that are dependent on the physical device you want to remove.
-->lsdev -C | grep sis
device in available state
The Hot Plug Task can be started with either SMIT or diagnostic (DIAG) tools menu.
-->diag
-Task Selction (Diagnostic ,ADvanced Diagnostics,Service Aids)
-Hot Plug Task -PCI HOT PLUG MANAGER
        -RAID HOT PLUG DEVICES
        -SCSI & SCSI RAID HOT PLUG MANAGER
-PCI HOT PLUG MANAGER
-Unconfigure a device-Device name -ent2
Go back to
-PCI HOT PLUG MANAGER MENU
-Replace/remove a PCI HOT PLUG Adapter.
After this option has been selected ,the pci slot will be put into a state that allows the pci adapter to be removed.
A blinking attention light will identify the slot that contains the adapter that has been selected for replacement
change the adapter now
cfgmgr new device
configure IP
-->smitty chinet
A repair action should be logged in Aix error report against the ent2 device, this will show others that error logged in tghe error reports has been solved
To enter repair action-diag-Task selection-log repair action-ent2 device.

G)Failed disk replacement
Reasons to replace a disk
-failed
-report i/o errors and you want to replace it.
-does not satisfy /meet your requirements.

Scenario1
If the disk you are going to replace is mirrored ,then
1. Remove copies of all logical volumes that were residing on that disk using rmlvcopy of unmirrorvg
2.Remove the disk from vg using reducevg
3.Remove the disk definition using rmdev
4.Physically remove the disk. If the disk is not Hot-Swappable , you may required to reboot the filesystem.
5)Make replacement disk available , If the disk is hot-swappable ,you can run cfgmgr, otherwise you need to reboot the system.
6)Include newly added disk in vg using extendvg.
7)Recreate & synchronize the copies for all lv using mklvcopy, or mirrorvg.

Scenario2
If the disk you are going to replace is not mirrored and is still funcitonal , then
1. Make the replacement disk available.
If the disk is hot-swappable , you can run cfgmgr; otherwise reboot is required.
2.Include newly added disk to vg using extendvg
3.Migrate all partitions from the failing disk to new disk using migratepv or migratelp.
If the disks are part of rootvg, consider the following
-If old disk contains a copy of the BLV, you have to clear it using --> chpv -c hdiskn
-New BLV must be created on new disk using bosboot
-Bootlist must be updated using bootlist.
-If old disk contains a paging space or a primary dump device you should disable them. After the migratepv command completes , you should reactivate them.
4.Remove old disk, reducevg
5.Remove old disk, definition,rmdev

Scenario3
If the disk is not mirrored ,has failed completely and there are other disks available in the vg then,
1.Identify all logical volumes that have at least one partition located on the failed disk
2. Close the lv, unmount all corresponding fs
3.Remove the file systems & logical volumes using rmfs
4.Remove the failing disk form vg ,using reducevg 
5.Remove disk definition,rmdev
6.Physically remove disk, if it is not HOT-SWAPPABLE
reboot is required.
7.Make replacement disk available, if it is HOT-SWAPPABLE run cfgmgr , if not reboot is required.
8.extendvg new disk
9.Recreate all lv, fs using mklv, crfs.
10.If you have a backup of your data,restore your data from backup.

Scenario4
If the disk is not mirrored, failed completely, no other disks available in the vg(vg has only one disk or all pv failed simultaneously) & the vg is not rootvg then.
1-Export  vg definition from system using exportvg
2-Ensure that /etc/filesystem does not contain any incorrect stanzas
3-Remove the disk definitions using the rmdev command
4-Physically remove disk,cfgmgr or reboot
5-Make new replacement disk available ,cfgmgr or reboot
6-If you have a vg backup , restore it using  restvg
7-If you dont have vg backup, recreate the vg,lv,fs
8-If u have a backup of your data,restore your data from backup

Scenario5
If disk is not mirrored ,has failed completely, no other disk available in vg & vg is rootvg then
-Replace the failing disk
-Boot in maintainence mode
-Restore the system from an mksysb image

I)Troubleshoot graphical problems

1)Full /home filesystem
-users will not be able to log in
-looks like hang
-go through command line

2)Name Resolution problems
-nslookup
-verify your systems network access
-server is up and running
-start and stop server-->smitty spnamerslv

2)export DISPLAY=server3:2.0
server3-xhost +server2
grant access to server2 on server3 to connect to the X server
server-xhost -server2
deny access

+TTY display problems
-->clear
failed
-->smitty
failed
TERM variable is not set to the correct value
--export TERM =vt100

J)perfpmr


MONITORING & PERFORMANCE TUNING

 MONITORING &  PERFORMANCE TUNING

 Disk quota
 It controls the use of disk space,
 It is defined for indiviudal users of group,
 It is maintained for each jfs.

Disk quota establishes limits based on the following parameters
-User's or group's soft limits,
-User's or group's hard limits,
-Quota grace period.

Soft limit- The number  of 1 kB disk blocks or the number of files under which the user must remain.
Hard limit- Maximum amount of disk blocks or files the user can accumulate under the established disk quotas.

Quota grace period - This period allows the user to exceed the soft limit for a short period of time (the default value is one week).

 If the user fails to reduce usage below the soft limit during the specified time, the system will interpret the soft limit as the maximum allocation allowed, & no further storage is allocated to the user.

Typically, only those filesystems that contain user home directories and files require disk quotas .

Consideration when implementing disk quota
-Your  system has limited disk space
-You require more file security
-Your disk usage levels are large
(Apply disk quota when above conditions are true)

The disk quota system can be used only with the journaled filesystem.

Dont establish quota for /tmp , because many editors and system utilities create temporary files in the /tmp filesystem, it must be free of quotas.

The specified file systems must be defined with quotas in the /etc/filesystems file and must be mounted/remounted.
The quotaon command looks for quota.user and quota.group(default quota files) in the root directory of the associated filesystem.


Display quota
-->quota

Use the chfs command to include the userquota and groupquota configuration attributes in the /etc/filesystems file.
--> chfs -a "quota = userquota" /home
enable user quota on the /home filesystem.

--> chfs -a "quota=userquota, groupquota" /home
both user and group quotas are on for /home.

The related entry in /etc/filesystems
/home
dev = /dev/hd1
vfs = jfs
log = /dev/hd8
mount = true
check = true
quota = userquota, groupquota
options = rw

The quota.user & quota.group file names are the default names located at the root directories of filesystem.

To name userquota, myquota.user
and groupquota, myquota.group
-->chfs -a "userquota=/home/myquota.user"
-a "groupquota=/home/myquota.group" /home

Entry in /etc/filesystems
/home
dev = /dev/hd1
vfs = jfs
log = /dev/hd8
mount = true
check = true
quota = userquota, groupquota
userquota = /home/myquota.user
groupquota = /home/myquota.group
options = rw

To duplicate the quotas established for user joey on to user ross
-->edquota -p joey ross

To enable quotacheck and turn on quotas during system startup, add in /etc/rc
-->vi /etc/rc
echo " Enabling filesystem quotas"
/usr/sbin/quotacheck -a
/usr/sbin/quotaon -a


To enable user quotas for the /usr/Tivoli/server/db filesystem
-->quotaon -u /usr/Tivoli/server/db

Disable user and group quotas for all filesystems in the /etc/filesystems file
-->quotaoff -v -a

To display your quotas as user joey
-->quota joey

Display quotas as the root user for user ross
-->quota -u ross



B)Recovering from a full filesystem

1)Fix a full / (root) filesystem

a)Use who command to read the contents of the /etc/security/failedlogin
--> who /etc/security/failedlogin

b) The condition of TTYs respawning too rapidly can create failed login entries.
To clear the file after  reading or saving the output,execute
-->cp /dev/null /etc/security/failedlogin

C)Check the /dev directory for a device name that is typed incorrectly . If rmto is created instead of rmt0, a file will be created in /dev called rmto.
Command will proceed until the / is filled , because /dev is part of / filesystem.
--> ls -l | pg
-look for the entries that are not valid, that do not have a major or minor number
-wrong filename
-file size grater than 500 bytes

D) If system auditing  is running, the default /audit directory can rapidly fill up and require attention.

E)Check large files, use find
-->find / -xdev -size +1024 -ls | sort -r +6
find all files greater than 1 MB , sort them in reverse order with the longest files first.

F)Before removing any file, check to ensure a file is not currently in use
-->fuser filename
If a file is open at the time of removal , it is only removed from the directory listing. The blocks allocated to that file are not freed until the process holding the file open is killed.

2) Fix a full /var filesytem

check the following

a)--> find /var -xdev -size +2048 -ls | sort -r +6
Look for large files in /var
b)Check for obsolete or leftover files in /var/tmp

c)Check the size of the /var/adm/wtmp file,
which logs all logins, rlogins , & telnet sessions
The log will grow indefinately untill system accounting clears it out nightly.
-->cp /dev/null /var/adm/wtmp
To clear /var/adm/wtmp

To edit the /var/adm/wtmp file , first copy the file temporarily with the following command
-->/usr/sbin/acct/fwtmp < /var/adm/wtmp > /tmp/out
Edit the /tmp/out file to remove unwanted entries then replace the original file with the following command
-->/usr/sbin/acct/fwtmp -ic < /tmp/out > /var/adm/wtmp

d)Clear the error log in the /var/adm/ras directory using following prodedure. The error log is never cleared unless it is manually cleared
[Never use the cp /dev/null command to clear the error log . A zero length errlog file disables the error logging functions of the operating system and must be replaced from a backup]

clear the error log in the /var/adm/ras directory using the following procedure
a)Stop the error daemon
-->/usr/lib/errstop
b)Remove or move the errorlog file to a different file system
--> rm /var/adm/ras/errolog   
or
-->mv /var/adm/ras/errlog filename(moved file)
C)Restart error daemon
-->/usr/lib/errdemon
D)Check /var/adm/ras/trcfile, if it is large and trace is not currently being run
-->rm /var/adm/ras/trcfile
e)If your dump device is set to hd6(default), there might be a number of vmcore* files in the /var/adm/ras directory . Remove these files if they are older.
f)Check the /var/spool/, which contains the queuing subsystem files
clear the queuing subsystem
-->stopsrc -s qdaemon
-->rm /var/spool/lpd/qdir/*
-->rm /var/spool/lpd/stat/*
-->rm /var/spool/qdaemon/*
-->startsrc -s qdaemon
g)Check /var/adm/acct/ which contains accounting records. If accounting is running ,this directory may contain several large files.
h)Check /var/preserve/ for terminated vi sessions. If a user wants to recover a session , you can use the
-->vi -r
to list all recoverable sessions.
To recover a specific session
--> vi -r filename
i)Check /var/adm/sulog file , which records the number of attempted uses of the su command and whether each was successful.(Recreates automaticaly)
j)Check /var/tmp/snmpd.log which records events from the snmpd daemon(Recreates automaticaly)
This file's size can be limited using /etc/snmpd.conf

3) Fix a full user defined filesystem
Fix a overflowing user defined filesystem
+--> find /fs -xdev -size +2048 -ls | sort -r +6
Check for files larger than 2MB

+Remove old backup files and core files.
-->find /\(-name "*.bak" -o -name \ "*.bak" -o -name ed.hup \) \ -atime +1 -mtime +1 -type f -print | xargs -e \ rm -f
This will remove old backup files,core files.
*.bak, a.out, core, * or ed.hup files

+To prevent files from regularly  overflowing the disk,
--> skulker
as part of the cron process and remove files that are unnecessary or temporary.

+--> find /var -xdev -mtime 0 -ls
Locate files that have been changed in the last 24 hours.

4) Fix a damaged filesystem
Filesystems get corrupted when i-node or superblock information for the directory structure of the filesystem gets corrupted , due to hardware error or corrupted programs.

Symptom of corrupted fs
-System cannot locate, read ,write data located in the particular filesystem.

Solution.
1)Unmount the damaged filesystem
-->smit unmountfs (for a filesystem on a fixed disk drive)
-->smit unmntdsk(for a filesystem on a removable disk)
2)Assess filesystem damage by running fsck
-->fsck /dev/myfilelv (unmount first)
Checks and repairs inconsistent filesystems.
3)If filesystem cannot be repaired , restore it from backup


c)The system error log
+Error logging is automatically started by the rc.boot script during system initialization, and is automatically stopped by the shutdown script during shutdown.
The errdemon program starts the error logging daemon ,reads error records from the /dev/error file, and writes entries to the system error log. The default system errorlog is /var/adm/ras/errlog file.
The last  entry is placed in NVRAM, and when system reboot starts, it is written in errorlog file.

-->/usr/lib/errdaemon
start at  boot, but u can restart it in failure

-->/usr/lib/errstop (use carefully ,only in special cases)
Stops the error logging daemonn, disables diagnostic and recovery functions. The errorlog should never be stopped during normal operations

-->/usr/lib/errdemon -l
Determine the path to your system's errorlog file.

-->/usr/lib/errdemon/ -s 2000000
To change the maximum size of the error log file.

-->/usr/lib/errdemon -B 64000
Change the size of error log device driver's internal buffer.

*errpt command
To retrieve the entries in the error log
1)To display complete summary report of the errors that have been recorded , but it does not perform error log analysis.
-->errpt

To display all the errors which have an specific error ID
-->errpt -j 8527F64

To display all the errors logged in a specific period of time
-->errpt -s 1122160405 -e 1123160405 (MINUTE,DAY,HOUR,MONTH,YEAR)

*The errclear command
To delete entries from the errorlog
To delete all entries from the error log
-->errclear 0

*The errlogger command
This errlogger command allows you to log operator messages  to the system error log.
The messages can be upto 1024 bytes in length
-->errlogger "This is a test of the errlogger command"
-->errpt
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
AA8AB241 1129134705 T 0 OPERATOR OPERATOR NOTIFIATION

Now to display the operator notification generated (id AA8AB341)
-->errpt -a -j AA8AB241
This is a test of the errlogger command.

*Extracting error records from a system dump

    The errdead command extracts error records from a system dump containing the internal buffer maintained by the /dev/error file.
The errdead command extracts the error records from the dump file and adds those error records directly to the error log.
[The error log daemon must not be running when the errdead command is run]
ex. To capture error log info from a dump image that resides in the /dev/hd7 file
-->/usr/lib/errdead /dev/hd7
*Redirecting syslog messages to error log
*Commands for manipulating error messages
errinstall
errupdate
errmsg
errupdate
ras_logger

D)The system log configuration
/etc/syslog.conf file controls the behaviour of the syslog daemon. syslogd uses /etc/syslog.conf
file to determine where to send the error messages or how to react to different system events.

-The  /etc/syslog.pid file contains the process ID of the running syslogd daemon.

+Format of the configuration file /etc/syslog.conf
There are 3 parts -facilities (which application)
         -priorites(seriousness)
        -Destinations(send to whom)
Facilities -- kern-kernal
        user-user
        mail-mail
        daemon, auth, syslog, lpr, news ,uucp

Priorities -- Message priority
emerg,
alert-H/W errors -to all users
crit-improper login attempts
err-unsuccessfull disk write
warning-abnormal but recoverable
notice-important informational messages
info-important informational meassages
debug-"        "        "
none-"        "        "

Destinations--
file Name - Full path name of file opened in append mode.
Host - Host name, start by @

User-- Usernames
*=All users

+Using the system log
After customizing /etc/syslog.conf file
restart syslogd daemon
--> stopsrc -s syslogd
--> startsrc -s syslogd
few eg.
1)To log all mail facility messages at the debug level to the file /tmp/mailsyslog file
(facility)mail.(priority)degug  (destination)/tmp/mailsyslog

2)To send all system messages except those from the mail facility to a host named barney
(facilities)*.debug;(facilities)mail.none

3)To send messages at the emerg priority level from all facilities and messages at the crit priority level and above from the mail and daemon facilities to users joey and ross
(faci)*.(prio)emerg;(faci)mail,(faci)daemon.(prio)crit (desti)joey,ross(destination)

4)To send all mail facility messagess to all users terminal screens
mail(faci).debug(prio) *(dest)

E)Performace tools overview
1) vmstat
Reports statistics about kernel threads, virtual memory, disks,traps,& cpu activity
Used to balance system load activity
-->vmstat
summary of the virtual memory activity since system startup

+Display five  summaries at 1 second interval
-->vmstat 1(interval) 5(reports)
summary of the virtual memory activity since system startup

+Display the count of various events
-->vmstat -s

+To display 5 summaries for hdisk0 and hdisk1 at 2 seconds interval
-->vmstat hdisk0 hdisk1 2 5

+Number of forks,since system startup
-->vmstat -f

2)sar
Collects ,report,saves system activity information.
+To report current activity for each two seconds for the next five seconds
-->sar 2 5 (5 times for every 2 seconds)

+To report activity for the first two processors for each second for the next five seconds
-->sar -u -P 0,1(processors) 1(each second) 5(nxt 5 seconds)

3)topas

Vital statistics about the activity on the local system on a character terminal.
It extracts and displays statistics for a system with a default interval of two  seconds
also
-Overall system statistics
-List of Busiest processes
-WLM statistics
The bos.per.tools and perfagent.tools (filesets must be installed on the system to run the topas

Parameters shown by topas
-cpu utilization,
usage,by user+systems
wait,idle
-Network interfaces
List of NIC, throughput, data received ,data transmitted
-Physical disks
list,Busy%, kBPS,TPS
-WLM classes
-Processes
Name,id, util-cpu, PS speed
-Events/queues
-File/TTY
-Paging
-Memory
-P.S.
-NFS
-->topas -P
busiest processes
-->topas -D
disk metrics
-->topas -i5 -n0 -p10
view top 10 processes in use while not displaying any network interface  statistics, in 5 seconds intervals

*svmon-
 Captures and analyzes a snapshot of virtual memory, current info of memory,memory leaking
-->svmon -P pid -i 1 3

*netstat
--> netstat i
verity status of all nic
-->netstat -in
MAC+IP
-->netstat -rn
Routing table
-->netstat -Cn
Display route costs if you have multiple routes having different costs to the same destination.
-->netstat -in
MTU size
-->netstat -m
kernel handles memory buffer for communication
purpose
-->netstat -v ent0|more
device driver info
-->netstat -s
statistics for all protocols icmp,udp,tcp,igmp,ip
-->netstat -p icmp/ip
about particular protocol
-->netstat -a (-an also)
for all sockets opended on your system

5)iostat
Report  cpu statistics,aysnchronous i/o (AIO)statistics I/O for the entire system,adapters,TTY devices ,disks, CD-ROMs
Use iostat when
-Performance problems
-After H/W and S/W changes to the disk subsytem
-After change to attributes of vg,lv,fs
-After change to OS
-After change to Application

To determine if a physical disk has become a performance bottleneck
-->iostat -T -d 1 60
Monitors disk activity for 60 seconds
check %tm_act & kbps

To display more detailed statistics about disk, we artificialy create disk activity on hdisk0 and then created 10 performance reports every 2 seconds
-->dd if=/dev/hdisk0 of=/dev/null
-->iostat -D hdisk0 2 10

Display cpu utilization
-->iostat -T -t 1 60
Monitor cpu activity for 60 seconds

AIO utilization
-->iostat -A

List mounted fs
-->iostat -AQ

Adapter utilization
-->iostat -a 1 10 | more
-->iostat -a -D | more

6)Procmon tool(Graphical)
Allows you to view and manage the processes running on a system
Default refresh 5 seconds
Process- Priority,nice values,how long running ,how much cpu using,how much memory using,how much i/o a process performing,who has created
Must install below filesets
-bos.perf.gtools
-->./opt/perfwb/procmon/procmon/
-->tty
terminal number

To wait for a process to finish and display the status use procwait
-->find / -type f > /dev/null 2>&1
-->procwait

F)Tuning using the /etc/tunables files
/etc/tunables/nextboot - Applied at boot time
/etc/tunables/lastboot - Lastboot messages
/etc/tunables/lastboot.log- Lastboot messages

G)Documenting a system configuration
Listing device attributes
-->lsattr -El ent0
Display status location code for all disk devices
-->lsdev -Cc disk
Display characteristics ,capabilites of hot plug PCI slots
-->lsslot -c pci
Display system machine type,serial number
-->lscfg -vp | grep -ip cabinets
Display info about H/W ,S/W
-->prtconf

FILE SYSTEMS

 FILE SYSTEMS

*file system types jfs,enhanced jfs, nfs, cdrom fs.

A)File system structure
>Superblock
Contains control info about a filesystem
-overall size of fs in 512 byte blocks.
-filesystem name
-filesystem log device - the version number
-number of inodes
-free inodes
-free data blocks
-date & time of creation
-filesystem state

All this data is stored in the first logical block of the filesystem. Corruption of this data may render the filesystem unusable . This is why the system keeps a second copy of the superblock on logical block 31
-->dd count=1 bs=4k skip=31 seek=1 if=/dev/hdn of=/dev/hdn

>Allocation group
Allocation group consists of inodes & its corresponding data blocks.

>inodes
Contains information about the file
-type
-size
-owner
-data&time
when the file was created ,modified, or last  accessed.
-Contains pointers to data blocks that  store the actual data of the file.

-Every file has a corresponding inode

Fof jfs filesystem, the maximum number of inodes,  & hence the maximum number of files is determined by the nbpi value(during installation)(number of bytes per inode), which is specified when the filesystem is created. For every nbpi bytes of your filesystem, there will be an inode created.
The total number of inodes is fixed.
The nbpi values needs to be correlated with allocation group size.

JFS restricts all filesystems to 16MB (2 rest to 24) indoes.
JFS2 file system manages the necessary space for inodes dynamically so there is no need of any nbpi parameter.

>Data blocks
Data blocks store the actual data of the file or pointers ot other data blocks , The default value for disk block size is 4 kB

>Fragments -for jfs filesystems only
Fragments of logical blocks can be used to support files smaller than the standard size of the logical block(4KB). This rule applies only to the last block of a file smaller than 32kB.
Also you have the option to use compression to allow all logical blocks of a file to be stored as a sequence of contiguous fragments.
These features can be useful to support a large number of small files. Fragment size must be specified for a filesystem at installation time.
Different filesystems can have different fragment sizes.

>Device logs
The journaled filesystem log stores transactional information about  file system metadata changes.
Data  from data blocks are not journaled. Log devices ensure filesystem integrity not data integrity.
This data can be used to roll back incomplete operations if the machine crashes.
JFS-jfslog
JFS2-jfs2log

After the operating system is installed, all file systems within the rootvg use logical volume hd8 as a common log.
You can create a JFS2 filesystem that can use inline logs. This means the log data is written into  the same logical volume as the  filesystem and not into the log logical volume.

 B)Filesystem difference
Function        jfs        jfs2
maximum filesystem size 1TB        4PB
max file size        64GB        4PB
number of inodes     fixed        Dynamic
inode size        128bytes    512bytes
fragment size        512b        512b
block size        4096b        4096b
Directory Org        linear        B-tree
compression        yes        no
jfs log            external(hd8)    external+natv
default ownership
at creation        sys.sys        root system
sgid of deflt filemode    sgid=on        sgid=off
quotas            yes        yes
file system shrink    not possible    possible 5.3+

*If you have to migrate data from a jfs filesystem to a jfs2 filesystem you have to backup the jfs filesystem & resotre the data on the jfs2 filesystem.

C)Filesystem management
>Create a filesystem
1)-->crfs -v jfs -g testvg -a size=10M -m /fs1
creates within volume group testvg,
jfs-filesystem-10MB
mount point =/fs1
if there is no existing jfs logical volume then it will be create now.
If there is no existing jfs log device , the system will create it now.

2)-->crfs -v jfs2 -g testvg -a size=10M -p ro -m /fs2
in testvg
jfs2 filesystem of 10MB
mounting point /fs2
permission -read only

If there is no jfs2 logical volume, it will be created now.
-->cat /etc/filesystems | grep fs1 /grep fs2
/fs1
dev=/dev/lv00
vfs=jfs
log=/dev/loglv00
mount=false    (dont mount at reboot)
account=false

3)Use crfs
-->lsvg -l testvg
testlv logical volume, jfs2 , existed but not associated with any fs, size=128MB, 1pp
-There is a jfs2 log device defined not attached with fs.
so using already existed components we create jfs2 filesystem located on already existing logical volume named testlv, using  jfs2 log device loglv01 and having /test as the mounting point.
-->crfs -v jfs2 -d /dev/testlv -a logname=loglv01 -m /test -a size=130M
Though here we have specified for the filesystem, a size bigger than the logical volume itsef, the size parameter is ignored and the final size of the filesystem will be rounded to the size of the logical volume.

>Mounting and unmounting fs
Mounting is the only way a filesystem is made accessible.
When a filesystem is mounted over a directory, the permissions of the root directory of the mounted filesystem take precedence over the permissions of the mount point.
Means whatever permissions on filesystem will be automatically apply on mount point directory.
-->mount /dev/fslv02 /test
-->umount /test

+Display mounted filesystems using the mount command
-->mount

+Display the characteristics of filesystems
-->lsfs -a
-->lsfs -q

>Removing a filesystem
Unmount the filesystem before deletion, rmfs command will delete the corresponding stanza from the /etc/filesystems and the logical volume on which the filesystem resides.
-->rmfs /test
error: if  still mounted

-->umount /test
-->rmfs /test
-->cat /etc/filesystems | grep test

>Changing the attribute of a filesystem
USe chfs command to change attributes of a file system, such as -mounting point permission
        -log device
        -size
-->lsfs -a
/dev/fslv00    --/fs2    jfs2    243322    ro no

-->chfs -a size=250M -p rw /fs2
filesystem size changed to 512M

If the new size for the filesystem is larger than the size of the logical volume, the logical volume will be extended to accommodate the filesystem, provided that it does not exceed the maximum number of logical partitions.

>Checking filesystem consistency
-fsck command checks filesystem consistency & interactively repairs the filesystem
**Do not run fsck command on mounted filesystem.
-you must be able to read the device file on which the filesystem resides.
-fsck command tries to repair filesystem metadata structure, display information about inconsistencies, prompts you for permission to repair them.
-fsck does not recover the data from datablocks,
If you lost data, you have to restore it from a backup.
-when the system boots, theh fsck command is called to verify the /, /usr, /var, /tmp filesytems.
An unsuccessfull result prevents the system from booting
--at boottime
-->fsck -f /, /var, /usr, /tmp
check repair fs metadata
doesn't recover data

>Log Devices
+Creating log devices
When the size of your file system is increasing , you should consider either increasing the size of the default log or creating new log devices
Use mklv command to specify type of logical volume,jfslog or jfs2log

+Initializing log devices
The log devices are initialized using the logform command by clearing all log records, such as jfslog, jfs2log or inline logs.

The logform command does not affect the data itself.
To initialize the jfs2log device named loglv01
-->logform /dev/loglv01

D) Defragmenting a filesystem
The use of fragments and compression , as well as the creation and deletion of a large number of files, can decreses the amount of contiguous free disk space.
-->defragfs /home
To improve the status of contiguous space within a filesystem.
E)Displaying info. about inodes
-->istat filename
-->istat /etc/passwd

f)Troubleshooting filesystems problems
>Superblock errors Recovery
Errors: fsck : not an aix3 fs
    fsck: not an aix4 fs
    fsck : not a recognized filesystem type
    mount:invalid argument
Solution :Restore the backup of the superblock ovre the primary superblock
dd count=1 bs=4k skip=31 seek=1 if=/dev/lv00 of=/dev/lv00

-->fsck -f /dev/lv00
if still not solved then recreate filesystem, restore the data from a backup.

>Cannot unmount filesystems
A filesystem cannot be unmounted if any refrences are still active within that filesystem.
The following situations can leave open references to a mounted filesystem.
+files are open within a filesystem
-->fuser /fs
Shows the running processes within fs

-->kill PID

+Running kernel extension
-->genkex
report on all loaded kernel extensions

+Filesystems are still mounted within that file system
-Unmount all the filesystems that are mounted within the filesystem to be unmounted

+A user is using a directory within the filesystems as their cwd.
-->find /home -type d -exec fuser -u {} \;
/home/prashant: 3548c(prashant)
Fuser appends the letter "c" to the process ID's of all processes that are using a directory as their cwd. The -u flag shows owner of process.

>Full filesystems
-->df
-->du