Veritas cluster Agents (VCS agents)

http://www.unixarena.com/2012/06/vcs-switch-over-issue-due-to-failed.html


VCS agents:
VCS agent use to communicate to the HAD (high availability daemon) with all the configuration and attributes.For an example, Mount agent should be able to monitor the filesystem and able to mount/umount the filesystem according to the given commands.

Agents are multi-threaded processes that provide the logic to manage resources. VCS has one agent per resource type. The agent monitors all resources of that type; for example, a mount  agent manages all mounts resources.When the agent is started, it obtains the necessary configuration information from VCS. It then periodically monitors the resources, and updates VCS with the resource status.
Above Notes from https://sort.symantec.com.

You may have the below questions in your mind about VCS agents.Its better know what it is and what actually it does .

Who provides the VCS agents ?
The blow mentioned  resource agents are shipped along with VCS. If the agents are not starting then ,you need to contact Symantec to fix the issue.
1.Oracle
2.Sysbase
3.LDOM
4.IP
5.NIC
6.Zone
7.Application
and many more.
Symantec also provides additional developed agents to support various applications on VCS. For an example,you want to cluster the SAP application ,you need a SAP agents .But in some cases original software vendor will provide the agents for VCS(independent software vendor ISV).

What are the VCS agents Properties?
1.Only one agent daemon runs on a system for each configured resource type.
2.An agent runs a single operation on a resource at one time.
3.Agents can perform operation on multiple resources of the  same type in parallel.
4.If there are no resource of a particular type anywhere in the cluster,the agent for that type is not started.
5.A resource cannot be managed without an agent.

What are the VCS agents functions ?
1.Start a specified program
2.Stop a specified program
3.Monitor the program
4.Clean up after a fault.

How it formed ?
1.An agent binary file,which contains all necessary function within single binary to control the resource.
2.An agent binary and a collection of scripts that correspond to agent function not included in the binary.

Issues with VCS agents:
In Production environment ,we may face the below issue very often.Here we will see how to fix that issue.
VCS-Switch over issue due to failed agent
VCS error: V-16-1-10195, resulting from failed fail-over of service group.
The failover of a service group fails, and hastatus -sum shows that an agent is 'failed' on the system the service group is to failover to.

Example of hastatus -sum output:
-- AGENTS FAILED
-- Type System
I IPMultiNIC node1

This is issue can be resolved by issuing the following command.It will just stat VCS agent if it failed abnormally or stopped for unknown reason.
# haagent -start IPMultiNIC -sys node1

This command is applicable for all the VCS agents which was showing failed in hastatus output.Here i have just shown the example with "IPMultiNic".Resourse agent can be anything like Mount,DiskGroup,Volume etc...

Thank you for reading this article.Please leave comment if you have any doubt.I will get back to you.
- See more at: http://www.unixarena.com/2012/06/vcs-switch-over-issue-due-to-failed.html#sthash.iLhvVQ5Z.dpuf

ISSUE:
When attempting to right click on a clustered Service Group and select the Switch command (or doing so manually through the Command Line Interface), the Service Group fails to switch and the error V-16-1-10195 'Cannot perform switch' is displayed.

SOLUTION:
There are several causes to this rare issue. The issue is generally caused by the failure of a specific agent on the node the service group is being switch to.

To find the failed agent, simply run the following command through the Command Line Interface:

hastatus -sum

At the bottom of the output, a heading can be seen called, 'Failed Agent'. Below this is a list of all agents that have failed to start on the node.
In this situation, the agents simply need to be restarted. This can be done through two methods:

1) Issuing a command of    hastop -local -force  and then hastart should restart the agents, but will also restart all cluster services on that node only. Production service groups should not be online on this node when these commands are ran.


2) Issuing a command of    haagent - stop <agent> -force -sys <system name> followed by   haagent - start <agent> -sys <system name>  where <agent> is the name of the failed agent, should restart the agent properly. Again, it is suggested that any production Service Group not affected by the failed agent be moved to another clustered node in order not to disrupt production itself.

猜你喜欢

转载自rooi.iteye.com/blog/1889001
今日推荐