Document Audience: | INTERNAL |
Document ID: | I0755-1 |
Title: | Tuning the ecache_scan_rate parameter of the Solaris cache scrubber provides improved Ecache parity error protection on non-mirrored SRAM UltraSPARC II-based systems |
Copyright Notice: | Copyright © 2005 Sun Microsystems, Inc. All Rights Reserved |
Update Date: | 2004-01-07 |
---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
FIN #: I0755-1
Synopsis: Tuning the ecache_scan_rate parameter of the Solaris cache scrubber provides improved Ecache parity error protection on non-mirrored SRAM UltraSPARC II-based systemsCreate Date: Aug/09/02
Keywords:
Tuning the ecache_scan_rate parameter of the Solaris cache scrubber provides improved Ecache parity error protection on non-mirrored SRAM UltraSPARC II-based systems
SunAlert: No
Top FIN/FCO Report: Yes
Products Reference: Solaris cache scrubber
Product Category: Software / Solaris
Product Affected:
Systems Affected
----------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- E3000 ALL Ultra Enterprise 3000 -
- E3500 ALL Ultra Enterprise 3500 -
- E4000 ALL Ultra Enterprise 4000 -
- E4500 ALL Ultra Enterprise 4500 -
- E5000 ALL Ultra Enterprise 5000 -
- E5500 ALL Ultra Enterprise 5500 -
- E6000 ALL Ultra Enterprise 6000 -
- E6500 ALL Ultra Enterprise 6500 -
- E450-HPC ALL Ultra Enterprise 450 HPC -
- A25 ALL Enterprise 450 -
- A33 ALL Enterprise 420R -
- A26 ALL Enterprise 250 -
- A34 ALL Enterprise 220R -
- N14 ALL Netra T-1405 -
- N15 ALL Netra T-1400 -
- N07 ALL Netra T1 100 -
- N06 ALL Netra T1 105 -
- N04 ALL Netra T-1125 -
- N03 ALL Netra T-1120 -
- A27 ALL Ultra 80 -
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
X-Options Affected
------------------
X2248A - - 480Mhz UltraSPARC II Module 8MB Cache -
X2244A - - 400Mhz UltraSPARC II Module 4MB Cache -
X1994A - - 400Mhz UltraSPARC II Module 2MB Cache -
X2240A - - 300MHz UltraSPARC II Module 2MB Cache -
X2230A - - 250MHz UltraSPARC II Module 1MB Cache -
X1995A - - 450Mhz UltraSPARC II Module 4MB Cache -
X1997A - - 440Mhz UltraSPARC II Module 4MB Cache -
X2580A - - 400MHz UltraSPARC II Module 8MB cache -
X2570A - - 400MHz UltraSPARC II Module 4MB cache -
X1993A - - 400Mhz UltraSPARC II Module 2MB Cache -
X1992A - - 360Mhz UltraSPARC II Module 4MB Cache -
X2560A - - 336MHz UltraSPARC II Module 4MB Cache -
Parts Affected:
Part Number Description Model
----------- ----------- -----
501-5729-04 or lower 480 MHz UltraSPARC II Module 8MB Cache -
501-5344-06 or lower 450 MHz UltraSPARC II Module 4MB Cache -
501-5539-06 or lower 450 MHz UltraSPARC II Module 4MB Cache -
501-5682-04 or lower 440 MHz UltraSPARC II Module 4MB Cache -
501-5235-04 or lower 400 MHz UltraSPARC II Module 8MB Cache -
501-4995-03 or lower 400 MHz UltraSPARC II Module 4MB Cache -
501-5239-05 or lower 400 MHz UltraSPARC II Module 4MB Cache -
501-5420-04 or lower 400 MHz UltraSPARC II Module 4MB Cache -
501-5425-04 or lower 400 MHz UltraSPARC II Module 4MB Cache -
501-5446-04 or lower 400 MHz UltraSPARC II Module 4MB Cache -
501-5500-03 or lower 400 MHz UltraSPARC II Module 4MB Cache -
501-5585-02 or lower 400 MHz UltraSPARC II Module 4MB Cache -
501-5237-04 or lower 400 MHz UltraSPARC II Module 2MB Cache -
501-5445-05 or lower 400 MHz UltraSPARC II Module 2MB Cache -
501-5541-02 or lower 400 MHz UltraSPARC II Module 2MB Cache -
501-5545-01 400 MHz UltraSPARC II Module 2MB Cache -
501-5149-A1 or lower 440 MHz UltraSPARC IIi Module 2MB Cache -
501-5740-01 400 MHz UltraSPARC IIi Module 2MB Cache -
501-5741-01 400 MHz UltraSPARC IIi Module 2MB Cache -
501-4178-04 or lower 250 MHz UltraSPARC II Module 1MB Cache -
501-4363-08 or lower 336 MHz UltraSPARC II Module 4MB Cache -
References:
PatchId: 103640-34 or higher - Kernel Patch (Solaris 2.5.1)
105181-23 or higher - Kernel Patch (Solaris 2.6)
106541-13 or higher - Kernel Patch (Solaris 7)
108528-04 or higher - Kernel Patch (Solaris 8)
FIN: I0570-3
I0616-1
URL: http://bestpractices.central/
http://cte-www.uk/cgi-bin/afsr/afsr.pl
http://onestop.eng/ecache/scrubber_tuning.txt
Issue Description:
UltraSPARC II with non-mirrored SRAM modules are susceptible to Ecache
parity errors. Systems shipped with mixed-vendor IBM/Sony SRAM CPU
modules have a higher susceptibility to E$ errors due to higher
particle emissions (less-favorable SER) on the IBM SRAM componentry.
To reduce the likelihood of Ecache Data, Writeback and CopyOut Parity
errors, a "Cache Scrubber" has been implemented in the Solaris Kernel
that periodically flushes modified cache lines out to main memory and
invalidates cache lines that have not been modified. By reducing the
likelihood that an otherwise nonfatal error in the Ecache will result
in a system failure, this procedure improves the system's reliability.
Solaris Kernel patches are available that provide improved handling and
reduction of Ecache errors in systems using UltraSPARC-II and -IIi
processors. Ecache parity errors on non-mirrored SRAM UltraSPARC
II-based systems result in unplanned system downtime. All customers
using Solaris 2.5.1, 2.6, 7 and 8 are encouraged to upgrade to latest
kernel patches as they become available.
The risk of Ecache parity errors can be further reduced by tuning the
ecache_scan_rate parameter of the Cache Scrubber. It is recommended
that the Cache Scrubber parameter "ecache_scan_rate" be adjusted on
affected systems and that the parameter not be adjusted above 1000.
ecache_scan_rate of 1000 causes the entire cache to be scrubbed once
per second. Little to no marginal benefit has been demonstrated of a
higher frequency for this parameter in terms of Ecache error
mitigation.
The default setting for ecache_scan_rate is 100. Setting this
parameter at 1000 has been demonstrated to provide additional
mitigation against Ecache errors. As the primary reason behind the
effectiveness of this measure is shortened duration of residency times
of meaningful data in the cache, increases in ecache_scan_rate above
100 but less than 1000 may also provide effective mitigation against
Ecache errors.
Identifying Candidate Systems
-----------------------------
The following criteria should be used to identify which systems
will most benefit from a modifying the Cache Scrubber ecache_scan_rate:
1) System is UltraSPARC-II and does not have mirrored SRAM ("Sombras").
2) System has had 1 or more Ecache errors or similarly configured
systems in the same install base have experienced Ecache errors.
3) Business impact of unplanned downtime on the system is significant.
4) System resides in an environment that has a history of temperature
and humidity control problems or in a region with typically dry
winters.
Scrubber Tuning Performance Impact
----------------------------------
Increasing ecache_scan_rate does require additional CPU resources
though testing has demonstrated that the most typical CPU utilization
penalty of setting ecache_scan_rate at 1000 on a 400MHz+ workgroup or
Telco server is less than 1%. If a particular system appears to be a
good candidate for scrubber tuning, and that system is known or
believed to have periods of 90%+ CPU utilization then it would be
important to test the setting on a test system approximating the
production environment and load to identify any performance impact
of the scrubber setting change.
The following command can be used to observe CPU utilization for 24
hours. The resulting file can then be graphed using a spreadsheet or
other graphing environment. This example samples system CPU idle time
every 10 seconds. The interval and count can be modified to to take
more frequent samples or to observe a longer total period.
# vmstat [ interval ] [ count ] | ...
# vmstat 10 8640 | awk '{print $22}' | grep -v id | grep -v
'^$' > /path/loadtest.csv &
If utilization on the target system is very high, it may be appropriate
to increase Ecache_scan_rate at a value greater than the default 100 but
less than 1000.
Risk of Ecache parity errors is diminished by tuning the
ecache_scan_rate parameter of the cache scrubber. The only other fix
for Ecache parity errors is mirrored SRAM which is not available for
UltraSPARC II-based midrange and Telco platforms.
Implementation:
---
| | MANDATORY (Fully Proactive)
---
---
| X | CONTROLLED PROACTIVE (per Sun Geo Plan)
---
---
| | REACTIVE (As Required)
---
Corrective Action:
The following recommendation is provided as a guideline for authorized
Enterprise Services Field Representatives who may encounter the above
mentioned problem.
Using the criteria given above to identify candidate systems, adjust
the kernel Cache Scrubber ecache_scan_rate as needed. Note that
although the procedure to adjust ecache_scan_rate is non-intrusive and
does not require a reboot, it is recommended that it be done during a
scheduled maintenance window.
To adjust ecache_scan_rate:
1. As root, run the following command to adjust ecache_scan_rate.
# echo 'ecache_scan_rate/W 0t1000' | adb -kw
NOTE: This does not require downtime. Be very careful, though,
as mis-typing the command could result in downtime.
2. To make the change permanent, add the parameter setting to
/etc/system. It is best to insert all 3 parameters together into
/etc/system if the settings are not already there:
set ecache_scrub_enable=1
set ecache_scan_rate=1000
set ecache_calls_a_sec=100
NOTE: Note on the 'ecache_scrub_enable=1', the 1 is set by default.
NOTE: If the settings already exist in /etc/system, simply modify
"ecache_scan_rate=100" to "ecache_scan_rate=1000".
NOTE: The ecache_scan_rate value should be 1000. A lower value, though
potentially beneficial in theory, is not known to be beneficial
whereas "1000" is. If any negative performance impact is
observed, and that is unlikely, it could be set back to some
lower value then.
To check a system's current setting use the following command.
This does not modify the setting in any way:
# echo 'ecache_scan_rate/D' | adb -k
Additional reference:
http://onestop.eng/ecache/scrubber_tuning.txt
Comments:
None
============================================================================
Implementation Footnote:
i) In case of MANDATORY FINs, Enterprise Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/
* From there, select the appropriate link to browse the FIN or FCO index.
Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist. Edist can be
accessed internally at the following URL: http://edist.corp/.
* From there, follow the hyperlink path of "Enterprise Services Documenta-
tion" and click on "FIN & FCO attachments", then choose the appropriate
folder, FIN or FCO. This will display supporting directories/files for
FINs or FCOs.
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
---------------------------------------------------------------------------