Cheryl’s List #136 – August 25, 2009

by | Aug 25, 2009 | Cheryl's List

1.  About Cheryl Watson’s Tuning Letter 2009 No. 4
2.  SHARE in Denver
3.  SMF Status Correction
4.  RMF Interval Recommendation
5.  Tuning Letter 2009 No. 4 Update
6.  IBM Red Alert

1.  About Cheryl Watson’s Tuning Letter 2009 No. 4

The forty-six page 2009 No. 4 Tuning Letter was emailed to paid subscribers on August 20th. You may visit our Web site at www.watsonwalker.com to obtain subscription information. The following is our Management Summary page from that issue, talking about some of the contents of this latest Tuning Letter:

SMF Management
SMF data is the basis for all tuning, capacity planning, chargeback, and performance work. Our SMF Series — Part 6 shows how to best manage the huge amount of data produced by DB2, CICS, and Workload Manager. Did you know that there’s a WLM type 99 record that you should always keep? I also cover the security records that are so very important to auditors. And I discuss controlling DB2 data, by far the most important concern for most installations. I’ve seen sites where 90% of SMF records are produced by DB2.

Offloading Work to Specialty Processors
A recent press release by a software vendor has led to a lot of speculation about how much work can be offloaded from the general purpose processors to specialty processors, such as the zIIPs and zAAPs. The vendor says that you can offload 50% of your work and reduce your software costs by 20%. But because they haven’t used the IBM-approved interfaces, nobody can say whether IBM will accept the offloading. And according to a letter from IBM, there is a risk in attempting to use this technique to reduce costs. See my full discussion of this situation in our What’s New? section.

Elsewhere in this Issue
Our User Experiences section provides a warning to be careful when migrating to z/OS 1.10 because the documentation for WLM needs updating. There are several references specifically for managers relating to use of the System z machines for Linux in our Publications section. We’ve also noticed recently that many of IBM’sNew Function APARs are containing many APARs that look like they should be HIPER. Because New Function APARs are often overlooked, we’ve highlighted some of the more critical ones for you.

2.  SHARE at Denver

This week, I’m at the SHARE conference in Denver, and I hope to see many of you there. I especially like the “Summer” conference because IBM provides the specifics of the release that becomes available in September — in this case, z/OS 1.11. And if you can’t make it to SHARE, you can look at most of the presentations online. Go towww.share.org for the proceedings. (Some presentations are not out there until the end of the week.) I think that SHARE is some of the best education you can find. Here are three sessions I’ll be at:

  • Session 2109 — Cheryl’s Hot Flashes #22, Friday, 9:30 am
  • Session 2149 — Meet the Experts: WLM, Performance & Capacity Planning Subjects, Thursday, 4:30 pm
  • Session 2235 – z/OS Mean Time to Recovery Improvements, Wednesday, 9:30 am

3.  SMF Status Correction

In Tuning Letter 2009 No. 3, I included a recommendation to shorten the frequency of the type 23 records (SMF buffers and workload characterization) by using the SMFPRMxx parameter of STATUS. The default is STATUS(010000) – one hour, but I recommended 10 or 15 minutes. You can also use STATUS(SMF,SYNC), which indicates that type 23 records are written at the SMF interval (normally 30 minutes) and synched to the RMF synchronization (normally 0). I mistakenly recommended that you use STATUS(001000,SYNC), which is an invalid combination. Neal Keller of Reed Elsevier Technology Services pointed out that you can use SYNC only when you’re using the SMF global interval. Because the SMF global interval is usually set at thirty minutes, you have to choose between synchronizing the records with (SMF,SYNC) or using smaller intervals and not synchronizing them with (001000). Thanks, Neal! So my recommendation has changed to STATUS(SMF,SYNC).

One of the reasons that the type 23 records are more important today is that IBM is now using them to collect new data on workload characterization. I discussed this in our Tuning Letter 2009 No. 2. Here is an extract from that newsletter regarding a SHARE presentation given by John Burg from IBM’s WSC:

SMF Type 23 – The second item [that John presented] was a description of the SMF type 23 record.  While this record used to simply contain SMF buffer statistics, it was expanded last year to contain workload characteristics that might help IBM classify the type of workloads that run on your machine.  This would allow you to better use the published LSPR findings. I discussed this record in our Tuning Letter 2008 No. 2 (page 24), and asked that customers contact Gary King and provide him data from your site. John’s now taken on that work and has asked that customers contact him to provide the SMF type 23 data. APAR OA22414 provides the data for workload analysis and a description of the new fields. New APAR OA27161 adds “delta” counters for the new fields. Once you’ve applied these two APARs, please contact John in order to send him your data. They need many more customer samples for their analysis. As John says, it’s an “opportunity to ensure that your data is used to influence clustering analysis” to see if workload characterization can be used for zPCR or other tools. John’s email is jburg@us.ibm.com, and he’s looking for volunteers who will provide him with up to 24 hours of SMF 23s, 70s, 72s, and 113s per LPAR.

    • OA22414 (z/OS 1.8-1.9, 25Jun2008) — SMF23 Statistics. This APAR produces an SMF type 23 record that contains new workload characteristics in addition to I/O density (TCB & SRB dispatches, DASD I/O, and RSM services such as getmains, page faults, fixed storage, and free storage) that may prove useful in the future for matching production to LSPR workloads.
    • OA27161 (z/OS 1.8-1.10, 4Feb2009) — New Function – Additional SMF Type23 Statistics. Adds “delta” counters for the previously added new fields.

z10 CPU Measurement Facility (CPU MF) – John also described the new hardware instrumentation available on the z10 ECs (GA2) and BCs. The software component is called the Hardware Instrumentation Services (HIS). Because the hardware is collecting the data, there is nearly no impact to the system being measured. IBM can see uses for this in future workload characterization, improving ISV and IBM products, and application tuning.

An SMF type 113 record can be produced from HIS, although there are currently no programs to produce reports from this record or the UNIX files that contain the samples. We don’t know if these will come later, or be incorporated into a product. ISVs will probably use this facility a lot. I expect to provide a longer article on the CPU MF after more people have tried it. If you do use it, please tell me what you find.

If you’re going to jump in and use John’s handout to run HIS, then I’ve got some suggestions:

1.  Study the documentation carefully before you begin. In addition to John’s presentation, you can also look at:

Ed Jaffe’s SHARE session 2839 (slides 16-21). This has some good examples of output.

Greg Dayne’s SHARE session 2848 (pages 54-55). This has a good set of steps for implementation.

IBM Research article – IBM System z10 performance improvements with software and hardware synergy at http://www.research.ibm.com/journal/rd/531/jackson.pdf.

SA22-7627-19 – MVS Commands, “Setting up hardware event data collection.”

SA23-2260-00 – The Set-Program-Parameter and CPU-Measurement Facilities.

SA23-2261-00 – IBM The CPU-Measurement Facility Extended Counters Definition for z10.

[Editor’s Note – The hyperlinks refer to SHARE sessions from the Austin SHARE.]

2.  Apply the required APARs: OA25755, OA25750, and OA25773 (z/OS 1.7 – 1.10, Oct2008), and ensure that you have the correct microcode level installed.

3.  You can probably safely use COUNTERS mode, which has low overhead (less than 1/100th of a second for the HIS address space during a 15-minute interval). But be careful of using SAMPLING mode because it can produce HUGE volumes of I/O unless you change the sampling frequency. The default of SAMPFREQ=800000 and DURATION=10 produces 8 million samples in 10 minutes. Start with a small value (e.g. SAMPFREQ=320), and never go higher than SAMPFREQ=130000 for a smaller z10 BC.

I’m very excited about the possibilities that the CPU MF can provide, and I hope that I’ll have an opportunity to either use it or hear from a customer who has used it. This was one of my favorite things from SHARE.

4.  RMF Interval Recommendation

In our Tuning Letter 2008 No. 2 (pages 8 & 9), I made the following recommendation regarding the length of the RMF interval:

Recommendation:  My new recommendation is that you record RMF to match the length and synchronization of INTVAL (30 minutes) and SYNCVAL (0). You will need to realize that everything will tend to look better with 30-minute intervals, as compared to 15 minutes. For example, if you are doing 10 minute intervals and get the following response times for a volume (in ms):  7, 25, and 5, then a program looking for device times over 20ms will identify this device. But a 30-minute interval would produce an average re-sponse time of 12ms, which wouldn’t be noticed.

Kathy Walsh from IBM’s WSC, along with several readers told me that this was a bad recommendation because the longer interval hides too many performance problems. They’re correct of course. While I had pointed out the risk of using larger intervals, I should never have changed my years-long recommendation of using ten to fifteen minute intervals for RMF.

In our Tuning Letter 2008 No. 4 (page 33), I included a chart and explanation fromChuck Hopf showing why he disagreed with my recommendation. I especially like his conclusion: “The larger the interval the more the peaks and valleys get washed out. I know IBM would like us to plan based on averages but if I stick my head in a hot oven and my feet in the freezer, my average temperature is still fine.”

5.  Tuning Letter 2009 No. 4 Update

On page 40 of Tuning Letter 2009 No. 4, I used the wrong link for the MXG site. Here is the corrected paragraph:

A Serendipitous Life. The IBM Systems Magazine May/June 2009 issue contains a wonderful article about a wonderful man, Dr. Barry Merrill, President and founder of Merrill Consultants (http://www.mxg.com).

http://www.ibmsystemsmag.com/mainframe/mayjune09/stoprun/25167p1.aspx

Here are two APARs that were updated since I included them in the newsletter:

PK64852 (z/OS 1.9–1.11, 24Jul2009, Closed UR1) — New Function. The APAR was taken to avoid a hang condition between two processes that use pipes or sockets for communication. The problem could not be reproduced, but the documentation is updated for C/C++ programmers to show how to prevent such problems. [Page 34, New Function APARs]

PTFs were issued on August 13, 2009. The note about the problem not being reproducible was removed.

OA18461 (z/OS 1.9–1.10, 29Jul2009, Closed UR1) and PK75626 (DB2 V9, 8Aug2009, OPEN) — Updates to WLM Dynamic Buffer Pool Management. WLM support for DB2 Buffer Pool Management was supposed to be available starting with z/OS 1.9. Unfortunately, the WLM support for this does not work as expected. This is closed as UNREPRODUCIBLE IN NEXT RELEASE. [Page 34, Hiper & Performance APARs]

PTFs were issued for OA18461 (for z/OS 1.9-1.11) on August 12, 2009. The note about the problem not being reproducible was removed.

6.  Red Alerts

IBM issues Red Alerts for especially important APARs, and the most recent one was in early July. You may subscribe to their service athttp://www14.software.ibm.com/webapp/set2/sas/f/redAlerts/home.html. In case you don’t subscribe to these alerts, you should at least be aware of them.

2009.08.24 — Possible data loss for z/OS 1,8, 1.9 and 1.10 with fix and HSM PATCH enabled for OA22507 [z/OS 1.8-1.9, 8Nov2007 — GDG Base Held for Too Long by HSM Migration]. The patch allowed changing GDG scratch processing during migration. When enabled, HSM may erroneously invalidate data sets. The data may be lost if no backup exists. For more information, see APAR OA30149 (HIPER, OPEN) — The TTOC Entry for a GDS Can Be Erroneously Marked Invalid When HSM Space Management Migrates the GDS to ML2.

Stay Tuned!

Subscribe to Cheryl's List