VMmanager Troubleshooting

From ISPWiki
Jump to: navigation, search
Hierarchy: VMmanager KVM ->OTHER
VMmanager Cloud ->OTHER

Issues with control panel

  • Upgrading VMmanager KVM to VMmanager Cloud.

This operation is not supported.

  • VMmanager is freezing and running slow, "too many connections" error is shown in the log

The possible cause of the issue: libvirt is freezing. Make sure that libvirt is accessible; restart it if needed.

  • what processes are important for VMmanager? What can I monitor?

The ihttpd process is very important. You also need to make sure that libvirt is up and running on serves.

  • What units of measure are used in a control panel?

KiB and MiB are used

  • KiB (kibibyte) 210= 1024;
  • MiB (mebibyte) 220 = 1048576.

How do they differ from KB and MB?:

  • MB (megabyte) 106 = 1000000;
  • KB (kilobyte) 103 = 1000.

How to create a virtual machine using MB and GB:

  • If you want to create a virtual machine with 2Gb of RAM, enter 1907MiB in the VM creation form (the more exact value is 2GB = 1907,35MiB);
  • If you want to create a virtual machine with 15GB of HDD, enter 14305Mib;
  • How to calculate units of measure.

The previous examples in GiB will look like the following:

  • 2 GiB = 2048 MiB;
  • 15 GiB = 15360 MiB;

For more information please refer to the article Binary prefixes

Issues with cluster nodes

  • After adding a virtual machine, one of the cluster nodes gets inaccessible. When trying to connect to it via SSL, you can connect to the virtual machine

When allocating IP addresses to virtual machines from the same network, IP addresses for cluster nodes are allocated from, the system may allocate the IP address that is already assigned to the cluster node, therefore the node will get accessible. To avoid this, reserve IP addresses of cluster nodes in the local IP pool or in IPmanager, if your system is integrated with this panel.

To release an IP address of the virtual machine, assign a new IP to that VM. Navigate to "Management" -> "Virtual machines" -> select a virtual machine -> open the "IP address" tab). Add a new IP and delete the IP address that overlaps the cluster node IP. Change the settings of the network interface so the the virtual machine can wotkwith the new IP and restart the network with systemctl restart network.

  • When you add a cluster node: Error installing 'vmmanager-kvm-pkg-vmnode' packages on the remote server. For additional information, please refer to the control panel's log"

Cause: libguestfs can be installed only in interactive regime.

Solution: install vmmanager-kvm-pkg-vmnode package from the console.

  • Cannot add a node with CentOS

A possible cause of the issues: the required packages cannot be installed. The cause of the issue - the epel repository is not connected. Possible cause - incorrect server time - the repository cannot be found. Correct the server time, if needed, and try again.

  • Cannot add a new node: "Cannot apply the Firewall rules: error in iptables rules

In vmmgr.log you can see that the system could not start the /etc/libvirt/hooks/firewall.sh script

# /etc/libvirt/hooks/firewall.sh
# Generated by VMmanager KVM on Sat Apr 18 21:31:18 CEST 2015 *filter
# ISPsystem firewall rules
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-F INPUT
-F FORWARD

COMMIT --------------------------------
ip6tables-restore v1.4.7: ip6tables-restore: unable to initialize table 'filter'

Error occurred at line: 2
Try `ip6tables-restore -h' or 'ip6tables-restore --help' for more information.

You need to comment records in /etc/modprobe.d/ipv6.conf, and restart the ipv6 module.

  • Utilities for cluster management

Execute the command on all cluster nodes:

/usr/local/mgr5/sbin/nodectl --op exec --target all --cmd 'echo "Hello, world!"' 

Access the cluster node via SSH:

/usr/local/mgr5/sbin/nodectl login <cluster node id>

View a list of cluster nodes:

/usr/local/mgr5/sbin/nodectl list

Only for VMmanager Cloud. Change the master-node (server priories will change and VMmanager will start on the specified server):

/usr/local/mgr5/sbin/cloudctl -c relocate -m <cluster node id>

Restoring VMmanager Cloud

Restoring virtual machines

If a cluster node fails, the virtual machines running on that node, will be recovered on an available node. Virtual disks of virtual machines must locate in network storages.

If the VM disk was located in local storage, the virtual machine won't be restored, as its data become inaccessible.

Virtual machines will be recovered on running cluster nodes right after the cluster node fails. The recovery process includes the following steps:

  • The system selects a cluster node where the virtual machine will be created Distribution of virtual machines on cluster nodes;
  • Creates the virtual machine with the same parameters as on the failed node;
  • Connects the network disk of the machine and restarts it.

The information about the recovery process is added into the VMmanager log vmmgr.log. The following is the information in the log during the restore operation:

Restore vm $id

where id is the identifier of the virtual machine in the VMmanager database (the mysql database - vmmgr, the vm table).

Restoring a cluster node

A cluster node is considered failed and is disconnected from the cluster, if it does not respond for more that 1 second. Please note: the server reboot takes a longer time. Take into account that you will need to restore the cluster node after reboot. You must not reboot the cluster node simultaneously so that their number is less than the quorum.

When the cluster node becomes inaccessible, corolistener changes the configuration file on all the running nodes. To restore the node, complete the following steps:

  • Make sure that all the services on the node are up and running;
  • Add the cluster node into the quorum by clicking the "Join" button in the "Cluster node" section -> "Cluster nodes":
  • Synchronize the corosync configuration file with other nodes:
/usr/local/mgr5/sbin/mgrctl -m vmmgr cloud.conf.rebuild
  • Start corosync:
/etc/init.d/corosync start
  • Start corolistener:
/usr/local/mgr5/sbin/corolistener -c start

Restoring a cluster after the quorum is lost

VMmanager Cloud is replicated on all the cluster nodes to ensure fault-tolerance of the control panel. VMmanager starts only on one cluster node. After the quorum is lost, it may happen that VMmanager cannot start on a cluster node even with the largest priority.

Cluster restore mechanism:

  • Perform the following operations on the master node:
1. Delete the "Option ClusterEnabled" option from the /usr/local/mgr5/etc/vmmgr.conf file;
2. Make sure the /tmp/.lock.vmmgr.service file is present. Create the file if it is not specified:
touch /tmp/.lock.vmmgr.service
3. Add the license IP address to the interface vmbr0:
ip addr add <IP address> dev vmbr0
  • Perform the following operations on the cluster nodes:
1. Make sure the /usr/local/mgr5/var/disable file is present. Create the file if it is not specified:
touch /usr/local/mgr5/var/disable
2. Make sure the /tmp/.lock.vmmgr.service file is not present. Delete it if needed:
rm /tmp/.lock.vmmgr.service
3. Make sure the license IP address is not present on the interface vmbr0.
  • Restart the control panel;
  • Enable the cloud function in the control panel.

Issues with storages

Issues with LVM-storage

  • Cannot create an LVM storage

When trying to add a new cluster node, you may get the error: "unsupported configuration: cannot find any matching source devices for logical volume group".

Make sure that the LVM commands show that physical volumes that we want to add, exist, and the storage is up and running. Solution: try to add the storage and cluster node:

 virsh pool-undefine storage-name

Issues with network LVM-storage

  • Cannot find the group volume on cluster nodes

If the vgs command on the cluster nodes doesn't display the LVM volume group from the iSCSI-storage, make sure that iscsi-target works with it. Execute the following command:

tgtadm -m target --op show

If the section is missing in the lun list, you should add it manually:

tgtadm -m logicalunit --op new --tid 1 -b /dev/sda2 --lun 1

Restart tgtd, connect the cluster nodes to the target, and execute pvscan to detect the pool.

  • Error adding a new cluster node

After editing the /etc/tgt/targets.conf file (for example, you need to connect a new cluster node, and one more initiator-address was added), the cluster node cannot be connected (the 'iscsiadm -m discovery -t st -p ...' command returns iscsiadm: No portals found). A possible cause of this issue: the service cannot restart and does not re-read the configuration file while clients are connected.

Make sure the service has been restarted:

service tgtd stop
killall -9 tgtd
service tgtd start

Attention! 'killall -9 tgtd' will terminate the tgtd processes, so you may lose all your data.

Issues with iSCSI-storage

  • Requested operation is not valid: storage pool is not active

This error occurs in case of problems with the iSCSI storage. Make sure the tgtd service is running on the server with storage. If the error occurs when adding a new node, access the node via ssh and execute:

[root@free ~]# virsh  pool-list --all
Name               Status Auto start
-----------------------------------------
File                 active yes       
iSCSI-UGLY_004       not active yes

If you see iSCSI-UGLY_004 not active yes, try deleting the storage and add a new node once again:

root@free ~]# virsh pool-undefine iSCSI-UGLY_004
 iSCSI-UGLY_004 has been deleted
  • internal error Child process (/sbin/iscsiadm --mode discovery --type sendtargets --portal xxx.xxx.xxx.xxx:3260,1) status unexpected: exit status 255

VMmanager cannot connect to the server with iscsi using port 3260. Possible causes of this issue:

1. Disable SeLinux

2. The required port is denied by the firewall settings.

  • operation failed: Storage source conflict with pool: '...'

The cause of the error: the dir or netfs storages already exists on the server, and it is located in the same directory, where you are trying to create a new storage. Solution:

  • Delete the existing storage on all nodes
virsh pool-list
virsh pool-dumpxml <pool-name>
virsh pool-destroy <pool-name>
virsh pool-undefine <pool-name>

Issues with RBD-storage

  • Cannot add a storage to cluster node
# ceph auth get-or-create client.vmmgr mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=isptest'
key for client.vmmgr exists but cap osd does not match

solution: Log in to the monitor and delete the client.vmmgr user

# ceph auth del client.vmmgr
  • Change IP-address of the Ceph-monitor

For newly created virtual machines: On the Ceph-monitor an administrator of the ceph-storage can change the IP address. In MySQL vmmgr on the master VMmanager in the metapool pool you need to change the srchostname field into a new node. In rbdmonitor table make sure "metapool" corresponds to "id" from the metapool table. Clear cache:

rm -rf /usr/local/mgr5/var/.db.cache.vm*

and restart the panel

killall core

For existing virtual machines:

sed -i 's/old_IP/new_IP/g' /etc/libvirt/qemu/*.xml
virsh define /etc/libvirt/qemu/*.xml

Issues with GlusterFS-storage

  • When locating two different VMmanager storages (QCOW2 and RAW) on the same GlusterFS volume and directory, cannot move the virtual disk between these storages

Solution: use different directories on the network storage.

  • Cannot import virtual machines from VMmanager or libvirt directly to GlusterFS, as the the libvirt driver doesn't record disk content from the thread.

Solution: use other types of storages, then move disks of imported machines into GlusterFS.

Issues with NFS-storage

  • Cannot create a virtual disk

libvirt cannot create a virtual disk in the network storage. In the log you can see the following information

rpc.idmapd[706]: nss_getpwnam: name 'root@testers' does not map into domain 'ispsystem.net'

Solution: specify the correct "Domain" parameter in the /etc/idmapd.conf file on the server and client. Restart the server and client once completed, and add the storage again to VMmanager.

  • Error deleting data from storage

Cannot delete disks created in the NFS storage:

libvirt error when executing  "VolDelete": "cannot unlink file '/nfs-pool/volume': Permission denied"

In /var/log/messages you can see the following information:

Sep 16 13:11:07 client nfsidmap[7340]: nss_getpwnam: name 'www-data@lan' does not map into domain 'localdomain'

Solution: specify the correct "Domain" parameter in the /etc/idmapd.conf file on the server and client. Restart the server and client once completed, and add the storage again to VMmanager.

Issues with virtual disks

  • libvirt error when modifying disk space

libvirt error when executing the "Grow" operation: "unknown procedure: 260"

This error occurs if you are using older versions of libvirt. To resolve the issue, update libvirt.

  • Failed to mount VM disks" after password change

This error occurs when trying to change the password of a virtual machine running the XFS file system located on a ext4 host server, kernel version 3.10 and later.

Run guestmount -v -x -a <disk_image> -i <path to mount point>. "mount: wrong fs type, bad option, bad superblock" means that the virtual machine cannot run with libguestfs.

Cause of the issue: guest xfs is not fully supported by the RHEL/CentOS 6 kernel. For more information, please refer to virt-inspector can't obtain info from rhel7.3 guest image on rhel6.9 host.

Unfortunately, libguestfs developers say that this bug cannot be resolved.

Network issues

  • IPv6 on Ubuntu 12.04

The following error occurs when adding the IPv6 cluster node:

"Unable to conncet to the XXX server. SSH or libvirt-bin might not be running"

Logs:

Mar 13 13:26:09 [2157:0x95E700] virt TRACE ErrorCallback libvirt error code=38 message=Cannot recv data: ssh: external/libcrypto.so.1.0.0: 
no version information available (required by ssh)
: Connection reset by peertname [2a01:230:2:3::3]: Name or service not known
Mar 13 13:26:09 [2157:0x95E700] virt DEBUG vir_host.cpp:70 Connect to qemu+ssh://[2a01:230:2:3::3]/system?keyfile=etc/ssh_id_rsa
Mar 13 13:26:09 [2157:0x95E700] err ERROR Error: Type: 'vir_connection'
Mar 13 13:26:09 [2157:0x95E700] virt TRACE Fail libvirt message: 'Cannot recv data: ssh: external/libcrypto.so.1.0.0: no version 
information available (required by ssh)

This error occurs on Ubuntu 12.04. Link to launchpad

Solution: Add the IPv4 cluster node.

Issues with virtual machines

  • Error creating a virtual machine: "ERROR: Exception 1: Insufficient RAM for VM creation", therefore Swap has sufficient RAM

Free RAM = free + cached. Swap is not counted.

  • A virtual machine is not accessible after reboot.

What to check:

1. Cluster nodes are connected to the master server (the server where VMmanager is installed).

2. vmwatch-master on the master server listens a correct IP address (the IP address is specified by the VmwatchListenIp parameter in the /usr/local/mgr5/etc/vmmgr.conf configuration file). If the IP address is not specified, the IP of a local cluster node will be used. Otherwise, the first IP address of the first interface will be used. After you make the changes, execute the /usr/local/mgr5/sbin/mgrctl -m vmmgr vmwatch.configure command for reconfiguration of services.

  • Cannot create a virtual machine

libvirt returns the error when executing "Start": "internal error Process exited while reading console log output: qemu-kvm: -chardev pty,id=charserial0: Failed to create chardev"

To resolve the issue, execute

mount -n -t devpts -o remount,mode=0620,gid=5 devpts /dev/pts
  • libvirt error while executing "Start": "Unable to create cgroup for test: No such device or address"

The cause of the problem is in Debian kernel. Restart the server with the cgroup_enable=memory kernel option

  • Cannot create a virtual machine

libvirt returns the error when executing "Start": "internal error cannot create rule since ebtables tool is missing."

Check the lsmod |grep ebt command output. If it doesn't return any data, ebtables is not supported by the kernel. You will need to recompile the kernel, or download another one and change the boot priority by modifying grub.conf.

OS deployment issues

  • The installer cannot download the response file

This error may occur when trying to connect to a virtual machine via VNC.

Possible causes:

1. IP-address of the virtual machine cannot be bound to the cluster node where the machine is created. This may happen if, for example, IP addresses in a data-center are bound to MAC-addresses;

2. The resolver on the virtual machine doesn't work. The first resolver from the parent server (the /etc/resolv.conf file) is specified;

3. VMmanager is trying to send the response file through the internal IP address which is nit accessible from the outside, and the installer downloads the response file via the external network;

4. Redirect for http-connection is set up in the ihttpd configuration file.

VMmanager receives the The IP address where the response file will be sent, from ihttpd. Execute the command to find it:

/usr/local/mgr5/sbin/ihttpd

The system will take the first IP from the list that meet protocol requirements. If the command shows an internal IP address as the first one, the installer won't get the response file. Configure ihttpd as described in the article Configuring built-in web-server, mark the external IP address as the first one.

  • Cannot receive the full preseed-file when installing Debian

Delete the "nocunked" option in the ihttpd - /usr/local/mgr5/etc/ihttpd.conf configuration file.

Restart ihttpd.

  • Cannot install Windows 2016 on the virtual machine: In VNC you can see that the boot process freezes

This error occurs on servers with QEMU 1.5, 0.12, 1.1.2.

Solution: when creating a virtual machine enable the host-passthrough CPU emulation mode. If QEMU 2.6 is used, you don't need to change the emulation mode to install the templates, however you may need to restart libvirt (service libvirtd restart) (if the system was updated).

  • The following errors may occur when installing FreeBSD-amd64 on certain types of processors
  • The installation process hangs;
  • In VNC you can see something like this:
FreeBSD installation error
  • Pressing the buttons has no effect;

In such cases we recommend using FreeBSDx32 images. You can also install http://elrepo.org/tiki/kernel-lt.

VM migration issues

  • Live migration error: virt TRACE ErrorCallback libvirt error code=38 message=Unable to read from monitor: Connection reset by peer

This error occurs in some configurations with libvirt 0.9.12.3 if you select "virtio" as a VM network interface. For successful migration, in the VM network interface you need to specify a different network device mode, for example "e1000". If you want to change this parameters for an existing virtual machines, you will need to restart it.

  • Internal error: unable to execute QEMU command 'migrate': this feature or command is not currently supported

This error occurs when trying to migrate a running virtual machine on CentOS 7 cluster due to the QEMU bug on CentOS 7

Possible solutions:

  1. Suspend a virtual machine and them migrate
  2. Set up QEMU from the RedHat repository

All commands are performed with root permissions in the console for every cluster node

yum  install centos-release-qemu-ev
yum update

Warning: all the installed packages will be updated (if new updates are available for them).

In the list of packages the following data can be found:

libcacard-ev
qemu-img-ev
qemu-kvm-common-ev
qemu-kvm-ev
qemu-kvm-tools-ev

After QEMU/KVM is updated, restart virtual machines and libvirtd.

To move a VM between two VMmanager, add the source server as a new node in new VMmanager and move the virtual machine using VMmanager tools.

  • Cannot migrate and back up virtual machines with qcow2 disks on servers running QEMU 2.6

Versions: QEMU 2.6, libvirt 2.0.

This is the bug QEMU 2.6, which was fixed in QEMU 2.7.

Solution:

  • Restart the virtual machine;
  • Update QEMU to 2.7. This variant has not been tested yet, and we cannot guarantee it will help. You can use it at your own risk.
  • LVM disk size is increasing during VM migration

This is the QEMU bug: https://bugs.launchpad.net/qemu/+bug/1449687 https://bugzilla.redhat.com/show_bug.cgi?id=1219541

You can fix it only in QEMU. There are two variants: migrate the suspended virtual machine and use qemu-img convert after migration for disk compression.


  • Attempt to migrate guest to the same host

If VM migration failed and var/migratevm.log contains the above error:

  • All of the cluster nodes have the same hostname.

Solution: Modify hostname. Edit the/etc/hostname and /etc/hosts files, change the old hostname into a new one.

  • The cluster nodes have the same product_uuid.

Run

cat /sys/class/dmi/id/product_uuid 

on all the cluster nodes. If their values coincide, you may use the following solution:

Solution: Edit the /etc/libvirt/libvirtd.conf file for the cluster nodes having the same product_uuid. Locate and comment out

#host_uuid = "00000000-0000-0000-0000-000000000000"

Specify the host_uuid value. It must not contain the same digits.

Restart Libvirt

  • libvirt failed while executing the "Define" operation: "unknown OS type hvm"

Reboot the server. If the problem persists, 1. Make sure that the virtualization is on in BIOS.

modprobe kvm
egrep '^flags.*(vmx|svm)' /proc/cpuinfo

No response means it is disabled and you should enable it.

2. The kvm service must be enabled:

service kvm status
  • Cannot migrate the virtual machine <vm name> : the backup process in progress

If the backup process is running in the control panel and the virtual machine is in the list for backup, all the operations with such a machine will be blocked unless the backup process is over.