Cheryl’s List #123 – May 14, 2008

by | May 14, 2008 | Cheryl's List

1.  Important APARs
2.  Request for More SMF Information
3.  Status of Tuning Letter 2008 No. 2

1.  Important APARs

Here are some APARs that I think are important:

  • Ken Williams, an independent consultant for CPT Global, pointed out a new OPEN APAR (5May2008), that I think is important because of the large number of lost MIPS that can be involved.  Here’s his email:

Here’s an unsolved APAR without a PTF that your readers might want to be aware of – it hit us hard here and could be worth more than 250 MIPS in our busiest CICS regions.  I think there’s more to it than just this PTF, something to do with the vast number of tasks and task switches in the region.

APAR OA24094 In a CICS Transaction Isolation Environment (ie. Using Subspaces) High Performance Overhead In IAXVP Doing CSP.

The APAR description is:

In a CICS transaction isolation environment (ie. subspaces are being utilized), very high performance overhead occurs in IAXVP during CSP processing due to the private storage getmains being done which drives page table initialization.  The cause of the high performance impact is the PTLB (purge translation lookaside buffers) which occurs upon the CSP instruction.  Note:  This problem will not be encountered if NOT running in a CICS transaction isolation environment where subspaces are NOT utilized. [Ed. Note – Are there too many ‘NOT’s in that last line?]

Ken would like to thank the respondents on IBM Main who pointed him on the right track for this APAR.

  • Jerry Urbaniak from Acxiom pointed out an IBM Red Alert that showed up on April 11, 2008:

A sysplex-wide outage can occur when using CFRM MSGBASED processing after the fix for APAR OA21917 has been applied.

APAR OA21917 (z/OA 1.7-1.9, 23Jan2008, HIPER) – ABEND00C RC13260001 IXCL2EVT Many Alter Begin/End Events Stacked.  PE.

APAR OA24563 (z/OS 1.7-1.9, OPEN 21Apr2008) – ABEND00C RSN130E0001, ABEND00C RSN13260001, and/or RC=10X From Parallel Sysplex Services.

See https://www14.software.ibm.com/webapp/set2/sas/f/redAlerts/ to get more information.

  • Another Red Alert showed up on Friday (9May2008):

Sysplex-wide impact due to loop in GRS on z/OS 1.9 and higher

In z/OS 1.9 and running in a GRS-Star configuration, GRS recovery for GQSCAN retries with incorrect registers.  This can result in an ABEND0A0 followed by repetitive ABEND0C6 or ABEND0C4 in ISGQNX.  As a result of the program check loop in ISGQNX’s recovery, no GRS work is able to run on the failing system.  This may cause sysplex-wide impact for the following reasons: 

    • Global resources held by the failing system cannot be released causing tasks that request the resources on other systems to wait.
    • If the failing system is a GRS Lock Manager, other systems may not be able to complete their XES requests.
    • Global GQSCANs may time out.
    • Global TEST ENQs cannot be processed.
    • No list ENQ requests can be processed on other systems if the failing system is holding the GRS list lock.

Symptoms may also include:

    • High CPU usage in GRS and *MASTER* on the failing system.
    • Message ISG361I indicating delays on other systems in the Sysplex.
    • D GRS, C is unable to complete on the problem system and Message ISG343I only returns local resource data on other systems.

Please see APAR OA24741 (z/OS 1.9, 7May2008, HIPER, ISGQNX Recovery Recursively Suffers ABEND0C6) for further details.

  • MXG Release 25.25 is important for sites going to z/OS 1.9.  There have been reported problems with going to z/OS 1.9 without this yearly renewal.  An even better release for z/OS 1.9 is MXG Release 26.01.  Plus, as usual, there are some really neat new features in the latest MXG releases.  See www.mxg.com.
  • Scott Barry of SBBWorks reports that a z10 customer experienced problems when turning on HiperDispatch.  The problem has to do with floating point arithmetic, and can occur in any program that does a lot of floating point.  I’ve heard that it has shown up in both SyncSort and SAS.  The problem isn’t in the vendor products, it’s in the floating point arithmetic logic.  So please don’t consider turning on HiperDispatch until you have applied the fix for this APAR.

OA24322 (z/OS 1.7-1.9, HIPER, 18Mar2008) – Floating Point Regs Corrupted When Running With HiperDispatch on D/T 2097.  When running with HiperDispatch=ON, if a job that uses floating point arithmetic gets interrupted by an SRM timer pop, the floating point registers may be corrupted because IRABABAL does not save or restore floating point registers.  This can result in incorrect output for the job that was interrupted.

2.  Request for SMF Information

In our last Cheryl’s List, I sent out the following request:

When discussing the new SMF logger function at SHARE, we realized that nobody seems to have a good idea of what volume of SMF data is being used these days.  So at my ‘Hot Flashes’ session, I asked people to email or fax me a copy of any output from IFASMFDP (which shows types of records, number of records, record size, and interval).  This can simply be the output from your normal run, or if you want to make a separate run, you could use either a full day or peak hours.

I’d appreciate any and all input, and will publish my findings in a later Cheryl’s List.  All materials will be kept confidential.

At least 50 people responded, and I really appreciate the information.  To answer the question about the volumes, most people are collecting SMF data at less than 1MB/second, while a few are collecting up to about 3MB/second.  But the reports are also showing me a lot of interesting things (such as those people who turn on DB2 records are seeing over 80% of all SMF being created by DB2!), and I’d like to share the findings.  But I’d also like information from even more installations before I complete my analysis.  So if you haven’t sent in your SMF reports, I’m still looking for more data.  Most of the reports I’ve received are for 24-hour periods.  But, I’m really interested to see the ratio between peak rates and 24-hour rates, so if you have a chance to send both, I would be very excited.  (And those people who have been kind enough to send the 24-hour summaries, it would be neat to see your peak periods too.)

One more request while I’m asking.  Would you also send me a copy of your SMFPRMxx member of IEASYSxx (or output from console command ‘d smf,o’)?  There are some things that I can’t deduce from the dump listing.  If you’ve already sent me the report output, just a listing of your parms would be terrific.  I think you’ll all be interested in the results.

3.  Status of Tuning Letter 2008 No. 2

I’m still working on the next issue of the newsletter, but it’s taking me a little longer than usual.  As many of you know, I’ve had some back problems and they’ve been causing me some grief.  My instructions are not to sit for more than ten minutes at a time.  Can you imagine how disruptive that can be?  Anyway, I expect the issue to be out soon.  The main topics in this issue are:  SMF Update – Part 1 (buffers, intervals, SMF logger); a new ‘Back to the Basics’ continuing column for those new to z/OS; an analysis of z10 processors; z/OS 1.9 Performance; a SHARE Trip Report; and lots and lots of interesting APARs and links.  I’m sure that you’ll like it.

Thanks for your continued patience and support.

Stay Tuned!

Subscribe to Cheryl's List