Wednesday, September 04, 2013

post-mortem: what went wrong last week in Puerto Rico


During the week of August 26th - 30th, two members of the CHAMP team (Rachel Kotkowski and Mike Jankulak) visited the CREWS station at La Parguera, PR.  Our goals were:

  • Identify and fix whatever had taken the station offline last March.
  • Swap out all instruments, since it is now one year after last August's swapout.
  • Move the "deep" instruments several metres higher.

Unfortunately we encountered problems from start to finish and fell short of nearly all of these goals, to wit:

  • The station lost power again on August 31st, only two days after reinstallation.
  • Both CTDs on the station failed the same day they were installed.
  • The mounting platform for the deep BIC (light sensor) is broken and the sensor is currently hose-clamped directly to the pylon.
  • Our UPR CREWS contact was unexpectedly absent and we had to work across a language barrier with her replacement.

Arguably the station is worse off compared to how we found it at the beginning of the week, and we still have no clear idea of what the problems are.  Below I will go into more detail about each of the issues listed above.

First, the power failure.  I had last visited the station in January and in the weeks following that visit I noticed that it seemed to be running off of battery power only, with no daily recharge from the solar panels.  At the time I guessed maybe that the solar panel connector-plug had not been fully plugged in at the top of the pylon, or perhaps one of the connector's pins had broken off.  Last Tuesday I recovered the "brain" (control package) and both sets of batteries from the station and brought them back to land.  The batteries I found to be drained to about 3.5V (compared to a normal range from 12V to 14V), and I connected them to a power supply for a quick charge back to near-normal levels.

However, I could find no evidence of loose pins, broken wires, or failed connectors, and both fuses in the brain were intact.  Far more confusing was the evidence in the logger's memory cache of what had been going on after the transmissions failed in mid-March.  As I'd thought, the station from January to March appeared to be slowly draining its batteries until levels were too low for transmitting.  But after a few weeks of no activity at all, the station in early April appeared (by evidence of its power levels) to have developed the opposite problem from before.  Instead of batteries connected without solar panels, from April to August the station appeared to have solar panels connected without batteries.  The new pattern of voltages began April 4th, which is two days after one of the monthly cleanings on April 2nd.  I do not know if there is any logical connection between these two events.
Click on graph to see a larger version.
Above is a graph of power levels since the beginning of 2013 (the x-axis is labeled with the day of year, from 0 to 365, and the y-axis is voltage).  There is a normal diurnal voltage pattern until my visit at the end of January, after which there are only a few days when battery charging occurs but mostly the power levels steadily decline.  In mid-March the power levels fall below 9V and all station activity ceases.  However, on April 4th the station power levels start fluctuating wildly between 0V at night to more than 17V in the daytime, which is a pattern I have seen when a station has lost its connection to its batteries but still has solar panels plugged in.  In this situation the station generally doesn't have a steady enough power supply to complete even one full transmission, although we did receive just one report on June 2nd (which led to a lot of head-scratching, since at that point we believed the station to be entirely without power).

As I've said, I could not find any "smoking gun" to explain these problems.  The evidence suggests that there has been intermittently faulty connections to the batteries and the solar panels, but not at the same time.  I'd brought a new charger-controller with me and, since this was one element that connects to both batteries and solar panels, I replaced it in the hopes that improvements would follow.  I also speculated that perhaps some of the brain hardware (such as the on/off switch) might be causing the problem.  The "brain" is made up of a backboard of fibreglass with many simple hardware elements screwed or soldered on permanently.  I have often replaced the main electronics components (datalogger, SIO4 serial-port units, satellite transmitter, RF radio) but I have never replaced the baseline hardware elements nor do I have the tools or expertise to do so.  To my knowledge none of these simpler hardware elements have ever failed at a CREWS station before now.

In any case I reinstalled the brain on Thursday, knowing that it would likely take several days of charging to know whether the solar panels and batteries were working properly again.  In fact it took only two days for transmissions to fail this time, probably because the batteries had been drained for months and two days of charging them off my small power supply was not enough to keep them going very long.

I am unsure what might be required to fix the power problems now.  One conservative approach would be to build an entirely new "brain" package, including all switches, connectors, screw-down boards and fuses.  Even this, however, is not guaranteed to help if there is a wiring problem with the permanently-installed solar panels on the station.  Also the wiring of the solar panels at this station is different from all other CREWS stations past or present and replacing the brain entirely might require careful examination and testing of those solar panels (or in the worst case, replacement of the solar panels).

Next, the CTDs:  As usual I reinstalled all station equipment and powered it on, then returned to the boat to connect by radio from my laptop to the station to verify that everything was working properly.  The CTDs have a more complicated program than most instruments -- the datalogger essentially "programs" them to run in interval mode and sets their clock according to its GPS-derived time.  The CTDs when online will spontaneously offer up new readings every 6 minutes but it often takes 6, 12 or even 18 minutes after startup before they start reporting.  So I checked in every 6 minutes and after a few cycles both CTDs came online at once.

It wasn't until I was back at my hotel that evening, when I was setting up the dataparsing routines to publish the new data formats on our web site, that I noticed that the Shallow CTD had stopped reporting within about an hour of startup.  At first I worried that I had done some kind of damage to the instrument's connectors, either up top or underwater at the instrument, to interrupt communications.  But nine hours after this failure the Deep CTD also went offline.  We've had many problems with this make of CTD (formerly Falmouth, now RDI-Teledyne) but this type of failure mode for two instruments suggests a programming error or a communications equipment failure.

In this case it's unlikely to have been a programming error since the program deployed to St. Croix a few weeks ago is nearly line-for-line the same as this one.  This leaves the possibility of a failure in the datalogger itself, or perhaps the SIO4 serial-port unit.  I would say a datalogger failure is unlikely as it should have had more immediate and far-reaching effects than just these two CTDs.  An SIO4 failure is a great deal more likely but this particular SIO4 was also communicating with the WXT (Vaisala "weather transmitter") and the "groundtruth" CT, and both of those (at least on Thursday) appeared to be functioning normally.

So in the case of the CTDs, it could be logger/SIO4 failure, cable failure, instrument failure, or some obscure programming error that for whatever reason doesn't manifest itself at St. Croix.  I should also mention that all of this station's underwater cables were replaced last year so they are less likely to be at fault.

Moving the "deep" sensors:

Back in April UPR's Dr. Roy Armstrong reported a higher degree of biofouling on the "deep" light sensor (BIC) compared to the shallow BIC.  He asked if we would consider moving the deep sensors (because we all agreed that the deep BIC and CTD should be more or less co-located) shallower.  I picked up this discussion last week and looked at the numbers.  Roy had suggested 3m shallower but I found only 3.07m separation between the two sets of CTDs so I replied with my own suggestion that we leave at least 1m of separation between deep and shallow instruments.  After some more back and forth with AOML scientists we all agreed on a 1m - 2m separation between sensors.

Rachel and I ran into problems on Tuesday, however, when we tried to move the platform higher.  Eventually we realized that the "collar" of the platform that hugs the station tube was now broken on one side.  It was built with two stainless steel allthread rods extending several feet out of the platform side (see photo below).  The back-collar piece slides over these rods and the whole thing is secured with washers and nuts in back.  But one of the allthread rods appears to have snapped off during removal.  [I'm not certain how old this mounting platform is.  I know we upgraded the shallow BIC's mount in 2010, replacing the original bare-allthread design, and that at the time the deep BIC's platform was already in place.  So it dates from sometime between 2006 and 2010.]

Click on photo to see a larger version.
I spent part of Wednesday trying to come up with a way that we could reuse this mount securely without the allthread rod.  I thought that wrapping a long hose clamp around the one side of the platform would be strong enough, but the 48" hose clamps seemed to be just a little too short to allow this.  Chaining together two 48" clamps would have been way too long (the notched part of the clamp allows only for about 10" of tightening).  I also tried chaining together two clamps of different lengths but the 48" clamps were slightly wider than the shorter clamps and could not be mixed/matched.  In the end I found that one 48" clamp could be forced around the collar with less than an inch to spare.  What I did not realize until Thursday was that the collar does not hug the pylon tightly enough for its two sides to meet, but leaves about an inch of allthread open to the ocean (c.f. the fouling pattern in this second picture, below).

Click on photo to see a larger version.
Back in the water on Thursday, we quickly realized that the hose clamps weren't going to be long enough.  We tried securing the platform with four-foot plastic zip ties but those clearly weren't going to stand up to even the gentlest of currents.  In the end we gave up on the mounting platform and simply hose-clamped the light sensor directly to the pylon (see photo, below, which shows the final configuration of the deep BIC and CTD).

Click on photo to see a larger version.
The parts of the mounting platform were brought back to shore and left in storage beside the CREWS box.  This is the large plastic red box in the shed by the dock (not the diving dock, but the other one) on Isla Magueyes, where we keep the climbing rungs, harness, and other supplies used during our visits.

Miscommunication with UPR:  For this trip I was looking forward to working for the first time with Diana Marcela Beltrán, who was our designated UPR/CREWS liaison.  This was a role she inherited from Wess Merton at some point in the last year or two.  As far as I'm aware this is purely a voluntary role, usually filled at UPR by grad students, and AOML to my knowledge does not pay UPR to provide this support.  So I try not to be too demanding.

After our last trip to St. Croix was completed (July 29th - August 2nd) I contacted Diana on August 5th to ask what her schedule would be like for the rest of the month.  At the time I was ccing Francisco Pagán, who had been our main UPR contact person since the station first went live in January of 2006, but later I learned that Francisco's contract was now ending and he was planning to leave UPR.  I didn't see Francisco during our week at UPR although I don't know whether this was because he'd already left or just because we were all so busy with our work.

Diana replied to my followup message (August 8th), saying she was at that time on vacation in Colombia but she would be back on August 26th to work with us, so we arranged our travel plans accordingly.  Our plans would call for working with Diana on the 27th and 29th (Tuesday and Thursday).

After we checked into our hotel on the evening of Monday the 26th I sent an email to Diana to coordinate a meet-up time for the next morning.  Late that same evening, however, Diana replied by email that her return flight from Colombia had been changed "until Friday" and she would be unable to work with us.  She had made arrangements "last week" to have us work in the field with Nibo instead.  As far as I know she had not told anyone at AOML about these changes in plans the week before.

Working with Nibo was good news in that I had worked with him once last year, and he was therefore familiar with how we needed the boat to be positioned/anchored.  But the last time I'd been traveling with an AOML diver who happened to be fluent in Spanish, and this time neither Rachel nor I could come up with more than a few words of Spanish between us.  And Nibo it seemed could not speak or understand English.

This was not generally a problem for our surface/aerial work, because I could communicate to Rachel what was going on and Nibo could step forward and assist Rachel as needed by watching what she was doing.  But the language barrier presented more difficulties when diving because Rachel and I were both in the water and language was our only tool for communicating our needs to Nibo in the boat.  We'd described generally what we'd planned before we started our dive, but when the BIC mounting platform failed we had to change the order of our work and we had no way of telling Nibo what to do.  Later (for example) we needed a pair of pliers, and on Thursday I needed another hose clamp, which I knew to be alone in a ziploc bag in one of my red boxes on the boat but I had no way of guiding Nibo to find what I needed.  Eventually we muddled through with simple words and gestures but at several points we decided to simplify our plans underwater rather than attempt to communicate more complex concepts across our language barrier.  I should mention that this is in no way Nibo's fault, because he was patient and helpful and friendly to the degree that we were able to communicate with him.  But this is something that should be raised as a red flag for future visits.

There was one moment on Thursday night when I was debating whether to try rescheduling our flights home and see if UPR could bring us back out to the station one more time.  Oddly enough, as the evidence mounted about more things going wrong (the second CTD failed at about 11pm local time), it became less urgent to return to the station on this visit simply because there was less and less likelihood that we could fix what was wrong.  But also weighing in to my decision was the fact that, without Diana or Francisco around, I had nobody at UPR that I could contact Thursday evening to ask about an unplanned Friday boat trip.  We thought maybe we could just show up at the university Friday morning and ask around but the likelihood of success of that approach was not great.  This was not a language-barrier problem so much as it was a void of support, with Diana and Francisco unavailable but nobody nominated to take their place.

Summary:  The station is offline and probably without power, although it may have returned to its only-during-the-day solar-panel-fueled activity.  As best as I can tell the two CTDs are not communicating with the datalogger and there's a good chance that they aren't even logging data to their own flash memory storage.  The deep BIC platform is broken and the sensor was left hose-clamped directly to the station.

I am short on ideas about what to try next, but I think at a minimum we should replace the entire brain unit with all-new equipment on our next visit.  Even that may not be enough if the permanently-installed solar panels are faulty, and I do not believe the solar panels can be removed and/or replaced without taking the entire station out of the water for repair on land.  I also think we should wait until hurricane season has ended before scheduling our next trip to this station.

This update was written by Mike Jankulak, with photos taken by Rachel Kotkowski.