Asset ID: |
1-77-1000479.1 |
Update Date: | 2011-02-16 |
Keywords: | |
Solution Type
Sun Alert Sure
Solution
1000479.1
:
Running CE Driver at 100Mb in Forced Mode May Cause PCI IOMMU Panic and/or Other Operational Issues
Related Categories |
- GCS>Sun Microsystems>Sun Alert>Criteria Category>Availability
- GCS>Sun Microsystems>Sun Alert>Release Phase>Resolved
|
PreviouslyPublishedAs
200617
Product
Solaris 9 Operating System
Solaris 10 Operating System
Solaris 8 Operating System
Bug Id
<SUNBUG: 6217062>
Date of Workaround Release
01-NOV-2005
Date of Resolved Release
01-FEB-2006
Impact
Running the CE Ethernet driver
(see ce(7D)) in forced mode (autonegotiation disabled) may cause a
panic, link down, or similar hardware related issues.
Contributing Factors
This issue can occur on the
following platforms:
SPARC Platform
- Solaris 8, 9 and 10 systems (with the CE Ethernet connection
configured for 100Mb with autonegotiation disabled)
x86 Platform
- Solaris 8, 9 and 10 systems (with the CE Ethernet connection
configured for 100Mb with autonegotiation disabled)
Notes:
- This condition has not yet been reported for 10Mb speed; only
when the CE driver is configured for 100Mb full-duplex with
autonegotiation disabled.
- Failures appear to be independent of hardware type (Saturn vs
Cassini, on-board vs. add-in card) and traffic loading.
- Failures may be dependent on switch used but this has not been
duplicated or proven to be a factor.
- Failure may also occur when the CE driver is configured with
autonegotiation enabled, but the switch is in force mode leading to a
mismatch between the link partners.
To date, no patch level
combination has been shown to have any effect on failure symptoms.
To determine if there
are CE (ce(7D)) ports configured on a system, the following
command can be run:
% ifconfig -a
lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
ce1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 10.8.55.60 netmask ffffff00 broadcast 10.8.55.255
To determine the link speed
and/or duplex mode used by the CE driver, the following commands can be
run:
# kstat -p ce | grep link_speed
# kstat -p ce | grep link_duplex
The following command/output
can be used to check whether there is a mismatch between the link
partners:
$ kstat -p | grep lp_
ce:0:ce0:lp_cap_1000fdx 1
ce:0:ce0:lp_cap_1000hdx 0
ce:0:ce0:lp_cap_100T4 0
ce:0:ce0:lp_cap_100fdx 0
ce:0:ce0:lp_cap_100hdx 1
ce:0:ce0:lp_cap_10fdx 0
ce:0:ce0:lp_cap_10hdx 0
ce:0:ce0:lp_cap_asmpause 0
ce:0:ce0:lp_cap_autoneg 1
ce:0:ce0:lp_cap_pause 0
If the autonegotiation has
worked well, lp_cap_autoneg should be 1. Also, if you see all the other
fields in the above output as 0, it can be assumed that the
autonegotiation has not worked well indicating a mismatch between the
link partners.
Symptoms
One of the following symptoms
may occur:
1. PCI iommu errors will occur,
pointing to a bus that contains the PCI card or onboard port using the
CE driver, as in the following example:
Sep 7 21:12:12 examplebox pcisch:
[ID 462479 kern.warning] WARNING: pcisch2 (pci@9,700000): PCI fault log start:
Sep 7 21:12:12 examplebox pcisch:
[ID 309153 kern.notice] PCI iommu error
Sep 7 21:12:12 examplebox pcisch:
[ID 866426 kern.notice] pcisch2: Error 1 on IOMMU TLB entry b:
Sep 7 21:12:12 examplebox Context=0 not Writable not Streamable
Sep 7 21:12:12 examplebox PCI Page Size=8k Address in page c1b30000
Sep 7 21:12:12 examplebox pcisch: [ID 219581 kern.notice]
Memory: Valid not Cacheable Page Frame=0
Sep 7 21:12:12 examplebox pcisch: [ID 684763 kern.notice]
pcisch2 (pci@9,700000): PBM AFSR=0x0.00000000
Sep 7 21:12:12 examplebox pcisch: [ID 120591 kern.notice]
dwordmask=0 bytemask=0
Sep 7 21:12:12 examplebox pcisch: [ID 829486 kern.notice]
pcisch2 (pci@9,700000): PCI primary error (0):
Sep 7 21:12:12 examplebox pcisch: [ID 227296 kern.notice]
pcisch2 (pci@9,700000): PCI secondary error (0):
Sep 7 21:12:12 examplebox pcisch: [ID 748186 kern.notice]
pcisch2 (pci@9,700000): PBM AFAR 0.00000000:
Sep 7 21:12:12 examplebox pcisch: [ID 127741 kern.warning]
WARNING: pcisch2: PCI config space CSR=0x2a80<signaled-target-abort,
received-master-abort>
Sep 7 21:12:12 exampleboxt log end.
Sep 7 21:12:12 examplebox unix: [ID 836849 kern.notice]
Sep 7 21:12:12 examplebox ^Mpanic[cpu7]/thread=30016fcfd00:
Sep 7 21:12:12 examplebox unix: [ID 578303 kern.notice]
pcisch-2: PCI bus 1 error(s)!
Sep 7 21:12:12 examplebox unix: [ID 100000 kern.notice]
Sep 7 21:12:12 examplebox genunix: [ID 723222 kern.notice]
000002a100077ea0 pcisch:pbm_error_intr+164 (30006cdfe18, 273,
3000019a398, 3, 30006cdfe18, 1)
...
Sep 7 21:12:12 examplebox unix: [ID 100000 kern.notice]
Sep 7 21:12:12 examplebox genunix: [ID 672855 kern.notice]
syncing file systems...
Sep 7 21:12:13 examplebox genunix: [ID 733762 kern.notice] 1
Sep 7 21:12:16 examplebox last message repeated 1 time
Sep 7 21:12:18 examplebox genunix: [ID 904073 kern.notice] done
Sep 7 21:12:19 examplebox genunix: [ID 353387 kern.notice]
dumping to /dev/dsk/c1t0d0s7, offset 65536
2. System panics constantly. A
stack trace of the core file will show something similar to the
following:
pc: 0x10048b54 unix:panicsys+0x44: call unix:setjmp
startpc: 0x1011c264 genunix:thread_create_intr+0x0: save %sp, -0xc0, %sp
unix:panicsys+0x44(0x10147e18, 0x2a10007ca98, 0x104241e0, 0x1, 0x2000, , 0x80001607,
0x10147e18, 0x2a10007ca98)
unix:vpanic+0xcc(0x10147e18, 0x2a10007ca98, 0x25c, 0x2a10007c9f8, 0x100bcd78,
0x3002787495a)
unix:panic+0x1c(0x10147e18, 0x30059d1c000, 0x10438f08, 0x30059b25460, 0x8, 0x0)
genunix:kmem_error+0x448(0x0, 0x30000035b00, 0x30059d1c000, , 0x30000035b00?,
0x30059d1c000?)
genunix:kmem_cache_alloc_debug+0xf8(, 0x30059d1c000?, 0x1)
genunix:kmem_cache_alloc(0x30000035b00, 0x1) - frame recycled
genunix:kmem_alloc+0x2c(0x2000, 0x1, , , 0x300285666d0, 0x78220800)
ce:ce_allocb+0xc(0x2000, 0x1)
ce:ce_replace_page+0xa8(0x3002f945e58, 0x30030c263b8, 0x2f, 0x2f0, 0x40,
0x30027a26040)
ce:ce_intr+0xed0(0x3002f945e58, , , 0x0, , 0x30027e90020)
pcisch:pci_intr_wrapper+0x80(0x3002ea64880?)
unix:intr_thread+0xa4(0x0, 0x0, 0x1041ccc0, 0x104245b0, 0x16, 0x0)
unix:prom_rtt+0x0()
-- interrupt data rp: 0x2a10001f9c0
pc: 0x10044454 unix:idle+0x6c: andcc %g2, 0x4 ( btst %g2, 0x4 )
npc: 0x10044458 unix:idle+0x70: bne,a,pt %icc, unix:idle+0x6c
global: %g1 0x3002f72d000
%g2 0x1b %g3 0
%g4 0x7 %g5 0
%g6 0 %g7 0x2a10001fd20
out: %o0 0 %o1 0
%o2 0x1041ccc0 %o3 0x104245b0
%o4 0x16 %o5 0
%sp 0x2a10001f261 %o7 0x10044494
loc: %l0 0x10045e64 %l1 0
%l2 0 %l3 0x2a1000e5d20
%l4 0 %l5 0
%l6 0 %l7 0
in: %i0 0 %i1 0xffffffffffffffff
%i2 0x104245b0 %i3 0x1041bc78
%i4 0x3 %i5 0x1041bc00
%fp 0x2a10001f311 %i7 0x1002ba00
<intr trap>unix:idle+0x6c(0x0, 0x0, 0x104245b0)
unix:thread_start+0x4()
Further detailed analysis of
the core file may show that the CE driver (dma engine) has written a
valid ethernet packet to a buffer that has already been freed.
3. Messages similar to the
following may appear regularly:
Mar 25 03:10:42 cdma1 genunix: [ID 408789 kern.notice] NOTICE:
ce5: fault cleared external to device; service available
Mar 25 03:10:42 cdma1 genunix: [ID 451854 kern.notice] NOTICE:
ce5: xcvr addr:0x00 - link up 1000 Mbps full duplex
Mar 25 03:10:42 cdma1 genunix: [ID 408789 kern.warning] WARNING:
ce5: fault detected external to device; service degraded
Mar 25 03:10:42 cdma1 genunix: [ID 451854 kern.warning] WARNING:
ce5: xcvr addr:0x00 - link down
Workaround
Configure the CE interface to
operate in autonegotiation enabled mode. Also make sure that the
corresponding link partner (switch port) is also configured to operate
in autonegotiation enabled mode. Make sure that the advertised
capabilities of both the link partners are matching with each other.
The 100baseT issues noted in
this Sun Alert have been eliminated at all reporting sites by
reconfiguring the ethernet ports connected to the Cassini ports to
ensure they are configured consistently. Mismatched configurations of
the switch and NIC port are not supported. Such mismatches can result
in "rx_tag_errors" which can, in rare cases, lead to the type of system
panics described in this Sun Alert.
Following the recommendations
in Sun document 817-6925-10 "Maximizing Performance of a Gigabit
Ethernet NIC Interface" is critical to avoid many system issues,
including those described here.
The Sun "Best Practices"
Blueprint document at http://www.sun.com/blueprints/0404/817-6925.pdf
gives the recommended practices on operating the ethernet link.
Customers may follow the recommendations given in this blueprint.
Resolution
Please see the
Relief/Workaround section for the resolution to this issue.
Modification History
Date: 11-NOV-2005
- Updated Impact and Contributing Factors sections
Date: 13-DEC-2005
- Updated Contributing Factors and Relief/Workaround sections
Date: 01-FEB-2006
- Updated Relief/Workaround section, re-release as resolved
Date: 16-APR-2007
- Updated Impact, Contributing Factors, Symptoms, and
Relief/Workaround sections
Previously Published As
102015
Internal Comments
Internal Contributor/submitter
mick.tabor@sun.com
Internal Eng Business Unit Group
SSG NSN (Netra Systems and Networking)
Internal Eng Responsible Engineer
mick.tabor@sun.com
Internal Services Knowledge Engineer
david.mariotto@sun.com
Internal Escalation ID
1-12212204, 1-11932350, 1-12622511, 1-8280969, 1-10417507, 1-12503684
Internal Sun Alert Kasp Legacy ID
102015
Internal Sun Alert & FAB Admin Info
Critical Category: Availability ==> Pervasive
Significant Change Date: 2005-11-01, 2006-02-01
Avoidance: Workaround
Responsible Manager: fernando.bonaventura@sun.com
Attachments
This solution has no attachment