Cheryl’s List #159 – June 30, 2012

by | Jun 30, 2012 | Cheryl's List

1.   New Red Alert – Potential Loss of SPOOL Data
2.   Real Storage Management (RSM) Problem

1.  New Red Alert – Potential Loss of SPOOL Data

2011.06.29JES2 Potential Loss of Spool Data on z/OS 1.11 and 1.12.

The fix for APAR OA36256 (RSU1112 PTFs UA61942, UA61943 UA61944 on HJE7760, HJE7770, and HJE7780 respectively) widened a timing window during JES2 initialization processing such that an initializing JES2 member may not obtain the correct status of other multi-access spool (MAS) members. As a result, this system’s view of a spool volume may differ from the rest of the MAS. Consequently, later HALTING or DRAINING actions against the spool volume may result in incomplete cleanup.

PE APAR OA39737 will address the timing window and ensure the initializing member has the most accurate status of the MAS during initialization. In addition, APAR OA38016 will address spool errors caused by the timing window that may result in potential loss of spool data. These types of spool errors are already corrected in z/OS 1.13.

Please see OA39737 and OA38016 for more details or updates.

Recommended Actions:

    • If PTF for OA36256 is applied, please avoid putting a spool volume into DRAINING or HALTING state. New volumes can be added or started without exposure.
    • If a Spool Drain or Halt must be done, Level 2 can check dumps of JES2 to determine if the spool volume is exposed.
    • If OA36256 is applied and a spool volume is already in DRAINING or HALTING state, please remove OA36256 and then (rolling) WARM start each JES2 member.

The applicable APARs are:

OA36256 (z/OS 1.11-1.13, 8Sep2011) – JES2 ABEND $XF1 Possible JESXCF SDC5 SEC5. The Cross-system Device Status PCE may inadvertently use out of date member name information when sending JESXCF messages to newly active members. As a result, JES2 may receive an invalid Return code back from JESXCF and issues a $XF1 abend.
OA38016 (z/OS 1.11-1.13, OPEN 22Aug2012) – SPOOL Volume is Marked Allocated but Really Is Not Resulting in Erroneous Message. 
OA39737 (z/OS 1.11-1.13, OPEN 8Jun2012) – After OA36256, JES2 Fails to Completely Clean Up a SPOOL Volume.

Note that OPEN APARs require an IBM login to access.

IBM issues Red Alerts for extremely critical problems. You should consider subscribing to their service at www.software.ibm.com/webapp/set2/sas/f/redAlerts/home.html.

2.  Real Storage Management (RSM) Problem

In Cheryl’s List #155, we mentioned an RSM problem where the Hiperspace frame counts in central continued to grow:

The problem is identified in RMF Paging Reports where the Hiperspace central storage frame counts continue to grow after an IPL. It’s even possible for these values to exceed the actual number of central storage frames on the sys-tem. It’s also noticeable because the MIN, MAX, and AVG frames are very similar continually. The solution of the PMR is to recycle the initiators that run large DFSORTs that use Hiperspaces. A working hypothesis is that an RSM control block (IARRAX) field, RAXHRECT, is not getting decremented under some conditions (possible when a standard hiperspace page from a large DFSORT is moved from a real frame to an auxiliary storage slot). 

Then in Cheryl’s List #156, we talked about a different Hiperspace problem where the count of Hiperspace pages in Real storage is larger than the TOTAL FRAME count. That one was resolved with APAR OA38295. We thought that the APAR might resolve the first problem, but it didn’t.

Our reader finally obtained a resolution for the original problem with APAR OA39215.

OA39215 (z/OS 1.11-1.13, 20Jun2012) – Count of Hiperspace Pages in Real Storage Incorrect. From the APAR: “The problem is a failure to decrement the count of hiperspace pages in real storage. This is happening in global steal when we attempt to steal, and page out, the CHANGED hiperspace pages to auxiliary storage. The IARUEPAG entry point fails to decrement RAXHRECT count when it is setting up the output paging I/O for the changed hiperspace page.” This causes an ever increasing value in the RMF report, and the count may persist even after the job is deleted and all Hiperspaces are deleted.

The problem goes back to z/OS 1.8, but only z/OS 1.11 and above are corrected.

Stay Tuned!

Subscribe to Cheryl's List