Today I tried to upgrade the foundation on one of Nutanix
cluster and it failed with the error saying “Foundation Service running on one
of the nodes, Test Failed”. I was wondering why foundation service/s are
running on CVMs and then I checked the recent tasks in Prism. Then noticed that
there were too many LCM checks are failed recently and then that could be the
issue as foundation service is not running under normal operations.
So, I SSH into one of the CVM, then ran the following
command.
allssh ‘genesis status | grep foundation’
Then I noticed that, there were few CVMs which are currently
running foundation service. If you see the highlighted output (sorry for the
creepy image, I didn’t had the proper image editing tool at the time I’m
writing this article) you can see the process ID inside the brackets for each
service and for the CVMs that are not running foundation service process ID is
null.
So, then I SSH to each CVM and then issues following command
to kill the foundation service.
genesis stop foundation
Once you hit enter, it will kill the foundation service and shows
the services are currently running on the CVM. If you closely look at the highlighted
area no process ID shown for Foundation.
Once I kill foundation service on each CVM, I was able to
continue the foundation upgrade as usual.
Please note that, If you destroy a cluster foundation
service will started permanently until you create a new cluster or add the
nodes to any existing cluster. In above scenario it’s a production cluster and
so my guess was correct as LCM uses foundation service to run certain operations.
0 Comments