--
Craig
,,, Wot, NO mountains!
======================oOO=(o o)=OOo===================================
Craig Morgan (_) Lecturer, CS Group
School of Computing Email: C.Morgan_at_soc.staffs.ac.uk
Staffordshire University Phone: +44 (0)1785 353466
Beaconside Fax: +44 (0)1785 353497
Stafford, UK ST18 0DG Pager: +44 (0)839 453754
"It's the downhill thrills, that make the uphill slog worthwhile..."
======================================================================
===========================================================================
From: Hellebo Knut <Knut.Hellebo_at_nho.hydro.com>
Regards,
At least for 3.0 I know there are patches for the tulip drivers. Maybe they
didn't make in time to 3.2 and you still have to install these ??
Contact DEC for info.
--
******************************************************************
* Knut Helleboe | DAMN GOOD COFFEE !! *
* Norsk Hydro a.s | (and hot too) *
* Phone: +47 55 996870, Fax: +47 55 996342 | *
* Pager: +47 96 500718 | *
* E-mail: Knut.Hellebo_at_nho.hydro.com | Dale Cooper, FBI *
******************************************************************
===========================================================================
From: Martyn Johnson <Martyn.Johnson_at_cl.cam.ac.uk>
I think I've read somewhere that different ethernet controller chips REPORT
collisions differently. For example, if a packet collides 3 times and then
goes on the fourth attempt, some chips report that as 1 collsion (because one
packet collided) whereas others report it as 3 collisions (because that's what
happened on the wire).
Fundamentally, whether a transmission collides or not is going to depend on
what is on the wire rather than the particular controller chip. Apart from
pathological timing effects, the performance of a particular chip or board is
unlikely to have any effect, except in so far as a high-performance interface
will load the network more and hence increase the general collision rate.
My guess is that the general difference you are seeing between lance-based and
tulip-based interfaces is an artefact. I suspect that there is some hardware
problem with the machine that is absurdly bad - either the machine itself
faulty or some problem with its connection.
I only have one tulip-based machine, and its ethernet performance seems fine
to me (about 7.1 to 7.6 Mbit thoughput using TCP with the machine in normal
service). It is running 3.2A. It is not meaningful for me to compare collision
rates because we are using switched ethernet, so traffic levels on different
segments vary anyway.
I suggest that you pay less attention to collision rate and start measuring
throughput with something like ttcp. Throughput is, after all, what actually
matters.
--
Martyn Johnson maj_at_cl.cam.ac.uk
University of Cambridge Computer Lab
Cambridge UK
===========================================================================
From: Dave Cherkus <cherkus_at_UniMaster.COM>
You can't directly compare lance and tulip reports this way. Here's
something I wrote a while ago on this topic:
Newsgroups: comp.unix.osf.osf1
Subject: Re: V3.0 E-Net Collisions with ftp
Organization: UniMaster, Inc.
Date: Wed, 4 Jan 1995 02:29:32 GMT
You are making a reasonable yet inaccurate assumption that the counters
are maintained the same way on both machines, but they are not because
the interfaces use two different chips and the chips used in the tu0
interface are more accurate than the ones used in the 2000/300 (ln0?)
interface.
The AMD LANCE ethernet chip, used in the 2000/300 and also used for
many years in DEC and many other vendor's equipment, tells the kernel
one of the following things happened after a frame is transmitted:
- no collisions occurred
- exactly one collision occurred
- two or more collisions occurred
The Ethernet standard says that up to 15 collisions can occur before
the transmission is aborted, so the LANCE does not communicate the
full story back to the kernel.
The kernel increments the netstat collision counter once when exactly
one collision occurred, and by two when two or more collisions
occurred. This is inaccurate, but it's the best the kernel could do.
It's not just inaccurate, it's always optimistic. This is why you
think you are getting 'excessive' collisions - you've been lied to
by the AMD LANCE in the past.
The older DEC SGEC chip (ne0) and the newer DEC TGEC chip (te0, tu0)
can tell the kernel exactly how many collisions occurred, and this is
what netstat reports. The AMD LANCE used in TurboChannel and ISA
systems is fading into the sunset...
You can identify which chip is being used by the message that appears
at boot time, or by the interface name (ln0 is AMD LANCE, most of the
others are tu0).
If you feel more comfortable with the 'classic' statistic, you can run
the command
# netstat -I tu0 -is
and look for 'single colllision' and 'multiple collision', then add the
'single collision' count to two times the 'multiple collision' count to
get the 'classic' statistic.
--
Dave Cherkus ----- UniMaster, Inc. ----- Contract Software Development
Specialties: UNIX TCP/IP X OSF/1 AlphaAXP AIX RS/6000 Performance ISDN
Email: cherkus_at_UniMaster.COM Tel: (603) 888-8308 Fax: (603) 888-8308
if (cpu.type == PENTIUM && cpu.step < 8) { panic("Intel Inside!"); }
===========================================================================
From: Mike Iglesias <iglesias_at_draco.acs.uci.edu>
See the message included below for an answer to your question. I got
it from the WAIS search feature of the
http://www-archive.stanford.edu/lists/alpha-osf-managers/hyper/
archive.
Mike
[S] Tulip Ethernet Controller Collision Rate
Bivins, Jeff (BIVINS_at_nebeng.otis.utc.com)
Sat, 30 Sep 1995 10:36:32 -0600 (CST)
My Original question is:
> Hello all,
> I have 35 AlphaStation 250 4/266 workstations and 2 AlphaServer 2100 4/233
> servers. All of these machines have a DEC TULIP PCI ethernet card. When
> using the 'monitor' tool I see on the average 30-40 percent of collision
> on a high throughput transfer.
> When I send a large file from on of these machine to a DECsystem 5900. The
> high collision rate only exist in the Alpha side and not the DECsystem
> side.
> Is this a tuning issue ?
Nope. It's normal.
> How can I resolve this issue ?
Thanks to those who responded
Matt Thomas
Dave Cherkus
J. Dean Brock
Dave Golden
The consensus is that the TULIP controller reveals accurate statistics on
collisions, where the LANCE controller does not.
I will look at this problem from a network perspective.
Thanks,
Jeff
===========================================================================
From: David Lucas <dlucas_at_worldbank.org>
Jim -
We noticed the same problem with our 2 2100s in a DECsafe ASE
environment. One of our Digital support people dug around in the
internal archives and found a paper entitled, "The Ethernet Capture
Effect: Analysis and Solution", K.K. Ramakrishnan and Henry Yang, (rama,
yang_at_erlang.enet.dec.com).
In a nutshell, the abstract describes the effect as a situation "where a
station transmits consecutive packets exclusively for a prolonged period
despite other stations contending for access." Essentially, the Tulip
interfaces, when transmitting, take over the wire never giving other
systems a chance to send their packets. The solution is a proposed
algorithm, Capture Avoidance Binary Exponential Backoff, that includes
"an enhanced backoff algorithm for collision resolution in the special
case when a station attempts to capture the channel subsequent to an
uninterrupted consecutive transmit."
Of course, none of this offers much practical advice on how to fix the
immediate problem. In our case, we believed our Alphas were having a
negative effect on our overall network, and simply bridged them onto
their own segment. It hasn't much improved the performance for those 2
systems, but at least our network guys can't point the finger at us when
they do have problems. :)
The paper is 31 pages long, and I don't have an electronic copy. What I
can try and do is scan it and mail it to you. (I have no way of making
a document available for anonymous ftp.) It may take a day or so, as
it's a bit hectic today.
Hope this is of some help to you.
d.
=======================================================================
David Lucas E-mail: dlucas_at_worldbank.org
The World Bank Phone: 202.458.5214
Practice random, senseless acts.
===========================================================================
From: Selden E Ball Jr <SEB_at_LNS62.LNS.CORNELL.EDU>
Jim,
I just took a quick look at the e'net interfaces on our Alphas.
We have old and new "tulip" systems as well as lots of 3000 series systems.
As best I can tell, the collision rates of both types are consistant
with the traffic on the ethernet segments to which they are connected.
Have you compared the collision rates of all of the systems
which are plugged into the same hub? I'd expect the ratio
of Opkts/Coll to be about the same there.
Selden
===========================================================================
From: "Jonathan B. Craig" <jcraig_at_i2k.net>
I don't know but I have been testing DEC NSR and have found that network
backups on my (very early model) DEC 2100 w/ Tulip cards have an
incredible amount of collisions (50% normal). If you get a suitable
response let me know!
--
Jonathan B. Craig jcraig_at_gfoods.com
Gordon Food Service
===========================================================================
From: nick_at_alldata.com (Frank "Nick" Riley)
I was reading through the archive a month or so ago, and I recall
reading a bunch of messages regarding a bug in the TULIP driver in
DU 3.? that required a patch. The symptom was intermittent "voids" in
the interface where absolutely no traffic passed. Look through the
archive at http://www.ornl.gov/cts/archives/mailing-lists/ and search
for "TULIP".
===========================================================================
From: ccult1!bommel!dehartog_at_relay.nl.net
Hello Jim,
You may want to ask your friendly Digital support people for
the patch: OSF350-070 (it's mandatory!).
Good luck!
===========================================================================
From: em_at_icess.ucsb.edu (Ed Mehlschau)
We received a tulip interface in a new AlphaStation that yielded very
poor performance until it was configured to run half-duplex instead of
full-duplex. Apparently DEC ships them in the full dux configuration.
I have been told that the config is changed from the boot PROM, but I
don't know the exact incantation offhand, sorry.
-- Ed
===========================================================================
From: anthony baxter <anthony.baxter_at_aaii.oz.au>
Just as a data point, I just checked our 4/233's and they all show
similar numbers (anything from 20% to 30%). These are 3.2A systems (they
go to 3.2C next week), and they show the same boot info for the tulip
card as your systems. They're plugged into a switching hub, so there is
no way in hell they should be seeing that level of errors.
tu0: DECchip 21040-AA: Revision: 2.3
tu0: DEC TULIP Ethernet Interface, _hardware address: 08-00-2B-E4-56-EF
I'd be very interested in anything you find out - I'm hoping it's just
a bug in the reporting code, but in any case it would be good to have it
fixed...
Anthony
===========================================================================
And a couple things I found in the A-O-M archives.
===========================================================================
Subject: (belated) SUMMARY: ethernet constipation on 2100 A500MP
X-Url: http://www.ornl.gov/its/archives/mailing-lists/alpha-osf-managers/1995/02/msg00346.html
Back in (I think) October I posted a description of a problem with the
Sable's ethernet interface. (Periodically, and for no apparent reason,
inbound packets would get stuck. As soon as the system sent a packet
to some other machine, the inbound clog would clear.)
Through a combination of absentmindedness and overwork, I never did get
around to posting a summary. So better late than never, here it is ...
I got some really helpful replies from a couple of DEC folks (who shall
remain nameless to keep them from getting swamped with unsolicited mail).
The first reply I got said
| [...] I believe you're seeing a bug in the Tulip driver. One
| that was recently discovered, and that too quite by accident.
| (A line of code was deleted and did not get reinstated.)
| It has to do with the driver failing to reset a timer when the
| transmit ring transitions to an inactive state (0 entries pending).
| Each time a transmit packet is given to the device, a timer is
| reset to go off after 5 seconds. This timer therefore never goes
| off if the device is kept busy. If, however, a new transmit does
| not come in within 5 seconds of the last one, then the timer
| goes off and the interface is reset. I believe this reset is what
| causes things to get hung-up.
The bug apparently first appeared in V2.0b, but was discovered too late
for a fix to make it into V3.0. Anyway, the helpful DEC person sent me
a patched version of the TULIP driver, and the problems disappeared. He
also mentioned that he had arranged for the patches to be made available
through Digital's Customer Support Center (for folks covered by a support
contract, of course). The relevant patch numbers are
OSFV20-065 (for OSF/1 V2.0b)
and
OSFV30-40 (for OSF/1 V3.0)
Mark Bartelt 416/978-5619
Canadian Institute for mark_at_cita.toronto.edu
Theoretical Astrophysics mark_at_cita.utoronto.ca
"Clothes not busy being worn are busy drying." - Dylan, on laundry day
[ singing "It's all right, ma (I'm only bleaching)" ]
===========================================================================
Subject: SUMMARY: tuo: packet dropped: no mbuf (again).
X-Url: http://www.ornl.gov/its/archives/mailing-lists/alpha-osf-managers/1995/07/msg00190.html
Thanks for the reply. DEC was very quick in getting back to me, and I was able
to ftp the patch, install it and rebuild the kernel within an hour of my call
to DEC. I am including the response I received from Matt Thomas describing the
patch.
thanks again,
dan cambron
ORIGINAL:
---------------------
>I included a previous summary for reference. I am at V3.2a on a 2100 using
>AdvFS and I'm still getting crashes and the message "tu0: packet dropped: no
>mbuf". The move to v3.2a doesn't seem to be working. Any thing else I should
>do. Is there a patch to v3.2a? I also have a call in to DEC.
>thanks
>dan
REPLIES:
-----------------------------
There is a patch.
/usr/sys/BINARY/if_tu.o (USG-01533)
CHECKSUM: 33316 54
/usr/sys/data/if_tu_data.c
CHECKSUM: 13750 7
----------------------
Patch ID: OSF320-044, OSF320-059
The Tulip (DECchip 21040) driver does not support software selection of
the 10Base2 (Thinwire) and 10Base5 (Thickwire) ports. As per the Tulip
specification, this selection is expected to be carried out in hardware,
and is done so on the DE425 and DE435 modules produced by Digital.
In the absence of a jumper solution or auto-sensing hardware, software can
also select between the 10Base2 and 10Base5 ports if the hardware
implementation utilizes a certain (undocumented) feature of the chip.
In particular, the 3-port PCI Ethernet card made by Standard Microsystem
Corporation (SMC) makes use of this feature, and the driver as shipped
today (since V2.0B), cannot select between the two AUI ports on this module.
This patch contains an enhanced media-sensing algorithm to allow software
selection of the 10Base2 and 10Base5 ports. This improved algorithm will
also provide better diagnostics on boards that use a jumper (such as the DE425
and DE435). For example, the driver will now warn the user if the jumper
position was set for Thinwire but no cable was connected to that port.
The driver will now display the following message:
tu0: auto sensing: selected BNC (10Base2) port: no carrier
This patch also contains a fix for a problem where the driver will print out
'packet dropped: no mbuf' messages to the console repeatedly. While this
happens, the system becomes unusable for all other activity and is effectively
hung from a user's point-of-view.
A kernel rebuild is required.
===========================================================================
Received on Fri Dec 22 1995 - 05:08:18 NZDT
This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:46 NZDT