Wednesday, September 28, 2016

APCUPSD again

Thank God I actually recorded what I was doing years ago on this little micro server.


I recently upgraded my USB mirrored boot sticks for OmniOS to a SSD.

Moving the system over was a little daunting. I followed a guide here, which was quite long. So who knows if in future years it will still exist.

I also posted about it here but didn't receive a lot of help. Basically Gea just told me to reinstall instead of migrate my rpool. My rpool is now called syspool. I also upgraded OmniOS from the old LTS r151006 to the latest stable r151018. I also updated napp-it, although I find I am doing lots of stuff via CLI.

I also bought a small monitor to go with my USB bluetooth keyboard so I could actually see what was going on if I can't log in.

But all that aside, I experienced a power outage, and my damn UPS didn't seem to work. Which also made me wonder if it had worked, would OmniOS even be aware of it? So I set about googling for help, and actually stumbled across my old blog post from 2013. Funny, I wasn't even aware it was me/mine and so while I was reading the post about buying the micro server I was like "OMG, this person did EXACTLY what I did (except they have 6 1TB 2.5" drives)..."

Anyway, I want to get APCUPSD working again, so let me reiterate what has happened.
I found the old Nevada Solaris iso here. I was also following this guide here.

# cd /tmp
# wget http://web.cs.sunyit.edu/network/downloads/OperatingSystems/Solaris/x86/Solaris_Express_Developer_Edition_1-08/sol-nv-b79b-x86-dvd.iso

# lofiadm -a /Solaris/sol-nv-b79b-x86-dvd.iso
/dev/lofi/1
# mount -F hsfs /dev/lofi/1 /media
# ls -ld /media/Solaris_11/Product/SUNWlibusb*
/media/Solaris_11/Product/SUNWlibusb
/media/Solaris_11/Product/SUNWlibusbugen


# pkgadd -d /media/Solaris_11/Product SUNWlibusb 
# pkgadd -d /media/Solaris_11/Product SUNWlibusbugen 
verify that the packages installed
# pkgchk -v SUNWlibusb
/usr
/usr/sfw
/usr/sfw/bin
/usr/sfw/bin/libusb-config
/usr/sfw/lib
/usr/sfw/lib/libusb.so
/usr/sfw/lib/libusb.so.1
/usr/sfw/lib/libusb_plugins
/usr/sfw/share
/usr/sfw/share/doc
/usr/sfw/share/doc/libusb

/usr/sfw/share/doc/libusb/libusb.txt

# pkgchk -v SUNWlibusbugen
/usr
/usr/sfw
/usr/sfw/lib
/usr/sfw/lib/libusb_plugins
/usr/sfw/lib/libusb_plugins/libusbugen.so
/usr/sfw/lib/libusb_plugins/libusbugen.so.1

# touch /reconfigure
# shutdown -y -g0 -i6


Log back in
# cd /tmp
vi /var/adm/messages
Look in dump for anything related to usb, you may want to first search for the current date to jump down to most recent messages
   /Sep 27
   /usba
Sep 27 15:08:26 tyrion usba: [ID 349649 kern.info]      American Power Conversion Back-UPS XS 1500 LCD FW:837.H4 .D USB FW:H4  JB0644015364
Sep 27 15:08:26 tyrion genunix: [ID 936769 kern.info] hid5 is /pci@0,0/pci103c,1609@12/input@5

Sep 27 15:08:26 tyrion genunix: [ID 408114 kern.info] /pci@0,0/pci103c,1609@12/input@5 (hid5) online
This indicated that it was loaded not as a HID USB device, but as a UGEN device.

So first to get the hardware address
dmesg > dump
# vi dump
Search for USB51 (because USB51d,2 is the vendor code for APC
Sep 27 15:08:26 tyrion usba: [ID 912658 kern.info] USB 1.10 device (usb51d,2) operating at low speed (USB 1.x) on USB 1.10 root hub: input@5, hid5 at bus address 2
Sep 27 15:08:26 tyrion usba: [ID 349649 kern.info]      American Power Conversion Back-UPS XS 1500 LCD FW:837.H4 .D USB FW:H4  JB0644015364
Sep 27 15:08:26 tyrion genunix: [ID 936769 kern.info] hid5 is /pci@0,0/pci103c,1609@12/input@5
Sep 27 15:08:26 tyrion genunix: [ID 408114 kern.info] /pci@0,0/pci103c,1609@12/input@5 (hid5) online

now to load the device as a ugen device
#add_drv -m '* 0666 root sys' -i 'usb51d,2' ugen

# touch /reconfigure
# shutdown -y -g0 -i6


Log back in
# cd /tmp
dimes > dump
Search for ugen
Sep 27 16:01:30 tyrion usba: [ID 912658 kern.info] USB 1.10 device (usb51d,2) operating at low speed (USB 1.x) on USB 1.10 root hub: input@5, ugen0 at bus address 2
Sep 27 16:01:30 tyrion usba: [ID 349649 kern.info]      American Power Conversion Back-UPS XS 1500 LCD FW:837.H4 .D USB FW:H4  JB0644015364
Sep 27 16:01:30 tyrion genunix: [ID 936769 kern.info] ugen0 is /pci@0,0/pci103c,1609@12/input@5
Sep 27 16:01:30 tyrion genunix: [ID 408114 kern.info] /pci@0,0/pci103c,1609@12/input@5 (ugen0) online

Also
# vi /var/adm/messages
Sep 27 15:42:01 tyrion usba: [ID 912658 kern.info] USB 1.10 device (usb51d,2) operating at low speed (USB 1.x) on USB 1.10 root hub: input@5, ugen0 at bus address 2
Sep 27 15:42:01 tyrion usba: [ID 349649 kern.info]      American Power Conversion Back-UPS XS 1500 LCD FW:837.H4 .D USB FW:H4  JB0644015364


# wget http://heanet.dl.sourceforge.net/project/apcupsd/apcupsd%20-%20Stable/3.14.14/apcupsd-3.14.14.tar.gz

# gunzip apc*
# tar xf apc*
# cd apc*
./configure \
  --prefix=/usr/local \
  --sbindir=/usr/local/sbin \
  --sysconfdir=/etc/apcupsd \
  --mandir=/usr/local/share/man \
  --with-log-dir=/var/log \
  --disable-cgi \
  --enable-usb
If you do a "make install" or "gmake install" at this point you'll get a failure
/tmp/apcupsd-3.14.14/include/libusb.h:9:17: fatal error: usb.h: No such file or directory
compilation terminated.

Some info on that can be found here 


'make' fails on OpenSolaris because by default it is missing a usb.h header file located in /usr/sfw/include. This can be retrieved by running this command inside that directory as root.

    # wget http://src.opensolaris.org/source/raw/sfw/usr/src/lib/libusb/inc/usb.h
Binaries are stored inside /etc/opt/apcupsd/sbin.
The file apcupsd.conf is needed to configure this USB device. Two lines that say the following must be included in this file.
UPSCABLE usb
UPSTYPE usb
Delete any other UPSCABLE/UPSTYPE lines. The apcupsd.conf file documents other types of APC UPS devices as well, usually those that rely on serial as opposed to usb.
Now, you should be able to run the daemon and you can verify that it will notify you by simply pulling the plug on the UPS. If you modify your root alias in /etc/mail/aliases the UPS will send you an e-mail when the power goes out. This sounds handy if I'm campus and I lose the electricity in my apartment, so I'll have plenty of time to power off my machine remotely.
Also in the comments: 

2. apcupsd-3.14.4 has a bug that causes 'make install' to fail on Solaris. This is fixed in the latest 3.14.x CVS branch, so presumably a future 3.14.5 or whatever would be fine. To download from cvs, install the SUNWcvs package, then use the following command to download it.
cvs -z3 -d:pserver:anonymous@apcupsd.cvs.sourceforge.net:/cvsroot/apcupsd co -d apcupsd-3.14-cvs -rBranch-3_14 apcupsd
Update it later by simply running 'cvs update' from inside the apcupsd-3.14-cvs directory.

Most annoyingly, the file that was missing is /usr/include/usb.h

You can't really install libusb from anywhere, and the old distros of openSolaris are disappearing from all over the web for download. The version I got above did apparently install libUSB, but was missing the usb.h file.

I found a source for the file here, and just copied and pasted it into a new usb.h file.

Then try make install again.
# make install
        src
        src/lib
        src/drivers
        src/drivers/apcsmart
        src/drivers/dumb
        src/drivers/net
        src/drivers/pcnet
        src/drivers/usb
        src/drivers/usb/generic
        src/drivers/snmplite
        src/drivers/modbus
        src/libusbhid
  COPY  apcupsd => /usr/local/sbin/apcupsd
  COPY  apctest => /usr/local/sbin/apctest
  COPY  apcaccess => /usr/local/sbin/apcaccess
  COPY  smtp => /usr/local/sbin/smtp
        platforms
        platforms/etc
  MKDIR /etc/apcupsd
  COPY  apcupsd.conf => /etc/apcupsd/apcupsd.conf
  COPY  changeme => /etc/apcupsd/changeme
  COPY  commfailure => /etc/apcupsd/commfailure
  COPY  commok => /etc/apcupsd/commok
  COPY  offbattery => /etc/apcupsd/offbattery
  COPY  onbattery => /etc/apcupsd/onbattery
        platforms/sun
  ------------------------------------------------------------
  Sun distribution installation
  ------------------------------------------------------------
  COPY  apcupsd => /etc/init.d/apcupsd
  LN    //etc/rc0.d/K21apcupsd -> ../init.d/apcupsd
  LN    //etc/rc1.d/S89apcupsd -> ../init.d/apcupsd
  LN    //etc/rc2.d/S89apcupsd -> ../init.d/apcupsd
=================================================
apcupsd script installation for Solaris Solaris complete.
You should now edit /etc/apcupsd/apcupsd.conf  to correspond
to your setup then start the apcupsd daemon with:

/etc/init.d/apcupsd start

Thereafter when you reboot, it will be stopped and started
automatically.
=================================================
Configuring ugen driver to match APC UPSes...

("ugen") already in use as a driver or alias.

NOTE:
   "(usbif51d,class3) already in use" and
   "Driver (ugen) is already installed"
   errors may be safely ignored.

=================================================
Driver configured. You must PERFORM A RECONFIGURE
BOOT "reboot -- -r" before running Apcupsd.
=================================================
  COPY  apccontrol => /etc/apcupsd/apccontrol
        doc
  MKDIR /usr/local/share/man/man8
  COPY  apcupsd.8 => /usr/local/share/man/man8/apcupsd.8
  COPY  apcaccess.8 => /usr/local/share/man/man8/apcaccess.8
  COPY  apctest.8 => /usr/local/share/man/man8/apctest.8
  COPY  apccontrol.8 => /usr/local/share/man/man8/apccontrol.8
  MKDIR /usr/local/share/man/man5
  COPY  apcupsd.conf.5 => /usr/local/share/man/man5/apcupsd.conf.5

Check that the USB device nodes have appeared
# ls /dev/usb/51d.2/*
# ls /dev/usb/51d.2/*
cntrl0  cntrl0stat  devstat  if0in1  if0in1stat




From : http://www.apcupsd.org/manual/manual.html#sun-solaris

In order to support unattended operation and shutdown during a power failure, it's important that the UPS remove power after the shutdown completes. This allows the unattended UPS to reboot the system when power returns by re-powering the system. Of course, you need autoboot enabled for your system to do this, but all Solaris systems have this by default. If you have disabled this on your system, please re-enable it.
To get the UPS to remove power from the system at the correct time during shutdown, i.e., after the disks have done their final sync, we need to modify a system script. This script is /sbin/rc0.
We do not have access to every version of Solaris, but we believe this file will be almost identical on every version. Please let us know if this is not true.
At the very end of the /sbin/rc0 script, you should find lines just like the following:
# unmount file systems. /usr, /var and /var/adm are not unmounted by umountall
# because they are mounted by rcS (for single user mode) rather than
# mountall.
# If this is changed, mountall, umountall and rcS should also change.
/sbin/umountall
/sbin/umount /var/adm >/dev/null 2>\&1
/sbin/umount /var >/dev/null 2>\&1
/sbin/umount /usr >/dev/null 2>\&1

echo 'The system is down.'
We need to insert the following lines just before the last 'echo':
#see if this is a powerfail situation
if [ -f /etc/apcupsd/powerfail ]; then
        echo
        echo "APCUPSD will power off the UPS"
        echo
        /etc/apcupsd/apccontrol killpower
        echo
        echo "Please ensure that the UPS has powered off before rebooting"
        echo "Otherwise, the UPS may cut the power during the reboot!!!"
        echo
fi

We have included these lines in a file called rc0.solaris in the distributions/sun subdirectory of the source tree. You can cut and paste them into the /sbin/rc0 file at the correct place, or yank and put them using vi or any other editor. Note that you must be root to edit this file.
You must be absolutely sure you have them in the right place. If your /sbin/rc0 file does not look like the lines shown above, do not modify the file. Instead, email a copy of the file to the maintainers, and we will attempt to figure out what you should do. If you mess up this file, the system will not shut down cleanly, and you could lose data. Don't take the chance.

[I did not do this modification because my file did not look like this]
I noted someone asked about this question here, and the response they got was


In message <CABEuEyoSx0yhBj+iMqKqQ1xximQDHnzfFckJ8rDiybqYyJTKzA at mail.gmail.com>, Sebastien Messier writes:
>So I have set-up almost everything for my apc ups to run on my server but>now I face a problem. They say I need to modify my /sbin/rc0 script and I>really dont want to fuck shit up as I'm aware this is quite a serious file. 
I've never found it necessary to modify /sbin/rc0.
apcupsd hits the BATTERYLEVEL or MINUTES thresholds in
/etc/opt/apcupsd/apcupsd.conf and shutdown(1M)'s.
What do your tests reveal?
John
groenveld at acm.org


You will then need to make the normal changes to the /etc/apcupsd/apcupsd.conf file. This file contains the configuration settings for the package. It is important that you set the values to match your UPS model and cable type, and the serial port that you have attached the UPS to. People have used both /dev/ttya and /dev/ttyb with no problems. You should be sure that logins are disabled on the port you are going to use, otherwise you will not be able to communicate with the UPS. If you are not sure that logins are disabled for the port, run the 'admintool' program as root, and disable the port. The 'admintool' program is a GUI administration program, and required that you are running CDE, OpenWindows, or another XWindows program such as KDE.
Solaris probes the serial ports during boot, and during this process, it toggles some handshaking lines used by dumb UPSes. As a result, particularly for simple signalling "dumb" UPSes it seems to kick it into a mode that makes the UPS think it's either in a calibration run, or some self-test mode. Since at this point we are really not communicating with the UPS, it's pretty hard to tell what happened. But it's easy to prevent this, and you should. Disconnect the UPS, and boot the system. When you get to a login prompt, log in as root. Type the following command:
eeprom com1-noprobe=true
or
eeprom com2-noprobe=true
depending on which com port your UPS is attached to. Then sync and shutdown the system normally, reattach the UPS, and reboot. This should solve the problem. However, we have some reports that recent versions of Solaris (7 & 8) appear to have removed this eeprom option and there seems to be no way to suppress the serial port probing during boot.

------------------------------------------

[Edit /etc/apcupsd/apcupsd.conf to reflect
 UPStype as 'usb' and
 UPS Device - leave blank
 cable type as UPSCALABLE smart


I tried enabling the daemon at this point, but it failed
# /etc/init.d/apcupsd start
Starting apcupsd power management ...ld.so.1: apcupsd: fatal: libusb.so.1: open failed: No such file or directory
/etc/init.d/apcupsd: line 24: 15893: Killed
        Failed.

Uh-oh, another missing file?
# find / -name libusb.so.1
find: ‘/proc/16084/fd/5’: No such file or directory
find: ‘/proc/16084/path/5’: No such file or directory
/usr/sfw/lib/libusb.so.1

Seems like I have it, but it isn't where it should be.. Maybe I need to export the path?
# export LD_LIBRARY_PATH=/usr/sfw/lib/

Then the daemon worked.
# /etc/init.d/apcupsd start
Starting apcupsd power management ... Done.

But an error occurred during the test

# /etc/init.d/apcupsd status

Error contacting apcupsd @ localhost:3551: Connection refused
# /etc/init.d/apcupsd stop
Stopping apcupsd power management ... Failed.

# /usr/local/sbin/apctest


2016-09-27 19:39:23 apctest 3.14.14 (31 May 2016) sun
Checking configuration ...
sharenet.type = Network & ShareUPS Disabled
cable.type = USB Cable
mode.type = USB UPS Driver
apctest FATAL ERROR in apctest.c at line 313
Unable to create UPS lock file.
  If apcupsd or apctest is already running,
  please stop it and run this program again.
  apctest error termination completed

I got this far in one night, and then I had to take a break.

"At this point, you should have a complete installation. The daemon will load automatically at the next boot. Watch for any error messages during boot, and check the event logs in /etc/apcupsd. If everything looks OK, you can try testing the package by removing power from the UPS. NOTE! if you have a voltage-signalling UPS, please run the first power tests with your computer plugged into the wall rather than into the UPS. This is because dumb serial-port UPSes have a tendency to power off if your configuration or cable are not correct."

-----

Night 2:

reboot the machine, re-export the LD_LIBRARY_PATH  and test apctest.

/usr/local/sbin/apctest


2016-09-28 21:41:39 apctest 3.14.14 (31 May 2016) sun
Checking configuration ...
sharenet.type = Network & ShareUPS Disabled
cable.type = USB Cable
mode.type = USB UPS Driver
Setting up the port ...
Doing prep_device() ...

You are using a USB cable type, so I'm entering USB test mode
Hello, this is the apcupsd Cable Test program.
This part of apctest is for testing USB UPSes.

Getting UPS capabilities...SUCCESS

Please select the function you want to perform.

1)  Test kill UPS power
2)  Perform self-test
3)  Read last self-test result
4)  View/Change battery date
5)  View manufacturing date
6)  View/Change alarm behavior
7)  View/Change sensitivity
8)  View/Change low transfer voltage
9)  View/Change high transfer voltage
10) Perform battery calibration
11) Test alarm
12) View/Change self-test interval
 Q) Quit

EXCELLENT


# /etc/init.d/apcupsd start

Starting apcupsd power management ... Done.

# /etc/init.d/apcupsd status
APC      : 001,036,0869
DATE     : 2016-09-28 21:45:26 -0400  
HOSTNAME : tyrion
VERSION  : 3.14.14 (31 May 2016) sun
UPSNAME  : tyrion
CABLE    : USB Cable
DRIVER   : USB UPS Driver
UPSMODE  : Stand Alone
STARTTIME: 2016-09-28 21:45:24 -0400  
MODEL    : Back-UPS XS 1500 LCD 
STATUS   : ONLINE 
LINEV    : 120.0 Volts
LOADPCT  : 6.0 Percent
BCHARGE  : 64.0 Percent
TIMELEFT : 56.6 Minutes
MBATTCHG : 10 Percent
MINTIMEL : 5 Minutes
MAXTIME  : 0 Seconds
SENSE    : Medium
LOTRANS  : 88.0 Volts
HITRANS  : 139.0 Volts
ALARMDEL : 30 Seconds
BATTV    : 27.6 Volts
LASTXFER : Automatic or explicit self test
NUMXFERS : 0
TONBATT  : 0 Seconds
CUMONBATT: 0 Seconds
XOFFBATT : N/A
SELFTEST : WN
STATFLAG : 0x05000008
SERIALNO : JB0644015364  
BATTDATE : 2013-06-11
NOMINV   : 120 Volts
NOMBATTV : 24.0 Volts
NOMPOWER : 865 Watts
FIRMWARE : 837.H4 .D USB FW:H4
END APC  : 2016-09-28 21:45:29 -0400  

Now the damn thing is complaining that it needs a new battery again. The old battery is from June of 2013 apparently.

Broadcast Message from root (???) on tyrion Wed Sep 28 21:55:24...
Emergency! UPS batteries have failed
Change them NOW

*************************
1) New AGM (All Glass Mat) batteries are on order. <check>
2) We can set the new battery date with /usr/local/sbin/apctest, option 4
3) We should do a test (pull the UPS power cable) to make sure
   a) the server shuts down gracefully
   b) when the power is restored to the UPS, the server comes back on
   c) when the server comes back up, that the daemon is started and that we don't need to manually export the library path again first.

If useful, re-read the APCUPSD user manual here:
http://www.apcupsd.org/manual/manual.html#sun-solaris

Lots of useful info, especially beginning with the section on how to run the tests.

Also, perhaps useful, would be this page, section 4.3 Setting Library Search Paths.....

4.3.4.2 Specifying Dynamic Libraries at Runtime