Web page optimization and IP packet size.
I've read a lot about web page optimization, and always see people mention IP packets. What are those, and what do they have to do with optimizing my webpage?
If someone can clear this up, I'd appreciate it, I want to have optimized web pages but don't really get what this stuff is referrning to.
Back to top
What is IP, what is html optimization, what is http?
Hi superduper, that's a good question. Here's how it all works, more or less, forgive any errors I make, some of this stuff is pretty technical.
How does TCP/IP and HTTP work?
A web page is a collection of files. Each file is transmitted over TCP/IP [ Transmition and Control Protocol / Internet Protocol ], using HTTP [ Hyper Text Transfer Protocol ].
Here is the relevant specification for how IP stuff works:
:: Quote ::The number 576 is selected to allow a reasonable sized data block to
be transmitted in addition to the required header information. For
example, this size allows a data block of 512 octets plus 64 header
octets to fit in a datagram. The maximal internet header is 60
octets, and a typical internet header is 20 octets, allowing a
margin for headers of higher level protocols. src: Internet Protocol
If I read this correctly, each packet delivered contains a maximum of 512 Bytes of Data, plus the header information. The IP headers are like the address on an envelope, they tell the network where the request came from, so it can be sent back, where the request is going, and what kind of data it is.
Heres the response headers for this page:
:: Code ::Content-Type: text/html
Date: Sat, 16 Oct 2004 16:55:14 GMT
How to best optimize your web page?
The goal is to drop the number of required packets to the complete minimum but still maintain the over all look and feel as far as possible.
What happens when a web page is requested?
So there are several factors at work here:
First, the raw number of files being served. Each file is going to be a minimum of 1 IP packet, no matter how small it is. And each file larger than 512 bytes is going to be split into separate IP packets. So for example a file that is 1025 bytes is going to be 3 packets, not 2.
You can calculate the full number of packets fairly easily in this way, just take each file, view its properties, see how many bytes it is, divide by 512, and you get how many packets that file will be. Any remainder in this division will be one more packet.
On page optimization
With careful HTML/CSS coding, using as much nesting of CSS as possible, you can get a heavily styled and CSS/P [CSS positioned div] web page to be generated by about 30 kB, including the graphics if you do them well. That's including text content. Any jpg type photos of course will be added on to that total. JPEG photos can also be radically optimized with almost no loss in end user results, most jpegs on the web are far too large, if you generate them using about 70% jpeg resolution the results are almost identical to 80-85% resolution as far as the end user is concerned, but the file size is about 50% smaller.
A reasonably small amount of attention to these kinds of issues can literally drop dialup page load times from upto 60 seconds on first page load to about 6-8 seconds.
Images tend to be the the worst offenders. A well designed and coded html page should run about 5 kB for the HTML markup, give or take a few kB. The CSS should run under 5-10 kB depending on the complexity of the page.
If you have graphics, use the .png 8 bit image compression instead of .gif. PNG is significantly sharper, and has smaller file sizes, than .gif, with the one exception of small 3 color or less images, then gif tends to be smaller.
PNG is significantly more efficient for most graphics than gif, and is also a fully open standard, which should be used if we are an open source project. Plus it's just superior in general. That's 8 bit png.
Browser caching of files
Most browsers have a default setting to only request a new version of the files needed to construct the web page when the version on the web server is newer than the version in the browser cache. You can use this fact to your advantage in two ways:
1. The user only needs to load the main components of your page one time, on the first page they come to, the first time.
2. You reduce your server bandwidth load dramatically. For large, popular sites, this can be a major consideration.
Hope that helps clear it up, superduper.
Back to top
One thing that many PHP webmasters don't realize is that PHP, by default, doesn't send out any headers relevant to caching since it doesn't have any mechanisms in place to know when the page really was last modified, or when it's expected to expire.
Adding caching headers adds an extra level of complexity and possible shortcomings of a site, but can increase a site's performance to the end-user remarkably since pages which have already been visited don't need to be requested again, and the user's ISP may also have a copy of the page. The extra ISP's caching alone can reduce the number of requests significantly, also speeding up those pages which are requested.
Caching headers include Last-modified, Expires, Cache-control, and E-tag (the most difficult one to implement). Implementing a good third-party cache optimization system probably also includes responding correctly to If-Modified-Since and If-None-Match (for those E-tags).
One thing you can do is use a caching script. This not only caches your pages locally for speeding things up on the server front - it also covers the E-tag business and If-None_Match, maybe also Last-Modified and If-Modified-Since if you're lucky. Unfortunately I don't have any real 'bulletproof' options to suggest at the moment, each caching script I've looked has had its warts or needed to be modified slightly. At the moment I'm using jpcache with a few slight alterations and it seems to be working well.
Back to top
Techpatterns uses last modified, but haven't checked into Expires, Cache-control, and E-tag, although I've seen that, never looked into it. Even with just last modified, performance goes way up, but I've noticed that google only intermittently correctly identifies the document file creation date, haven't seen a real pattern to that, maybe some of the googlebots trigger on last mofidified, others not, hard to say.
Any more information on those others would be interesting to read, minck, thanks, jeffd
Back to top
Cache-control: max-age is just the maximum age the page can have in seconds, relative to the time of the request.
Expires: just like it sounds
Last-lodified: also like it sounds
both Expires and Last-modified need dates in the format Thu, 26 Aug 2004 02:35:54 GMT
E-tag: a unique identifier for page content that's sent with the page - caches send the E-tag of their copy in cache along with the request as an 'If-None-Match' header if they've got the page cached with an associated e-tag - apache and most other modern servers deal with this just fine, producing e-tags and reacting properly to if-none-match - getting php to do this means a whole extra layer of programming, or or adding extra functionality to the part of your script that handles requests (could mean building a whole layer, or using java-like MVC OO methods - I must say, though, jpcache does this fairly nicely if you modify it a tad, and is easy to install).
mnot.net has got probably the best-known page on getting your web pages cacheable in theory and practice.
Back to top
Posted: Mar 25, 05, 20:38 techAdmin
Status: Site Admin
Joined: 26 Sep 2003
Location: East Coast, West Coast? I know it's one of them.
I found another decent article on TCP/IP MTU Datagram sizes.
:: Quote ::Even though we can usually consider the TCP/IP internet to be like a large, abstract “virtual network” of devices, we must always remember that underneath the network layer, data always travels across one or more physical networks.
Because of this, the standard datagram packet size is 576 Bytes, 512 for the data, and the rest for the headers, which tell it where to go, what to do, and so on.
:: Quote ::Each router must be able to fragment as needed to handle IP datagrams up to the size of the largest MTU used by networks to which they attach. Routers are also required, as a minimum, to handle an MTU of at least 576 bytes. This value is specified in RFC 791, and was chosen to allow a “reasonable sized” data block of at least 512 bytes, plus room for the standard IP header and options. Since it is the minimum size specified in the IP standard, 576 bytes has become a common default MTU value used for IP datagrams.
That's a pretty clear explanation of all this.
Back to top
jeffd, That was an excellent post! It prompted me to take a look at my file sizes; one of them is a wopping 64k! I'll be working on bringing that one, and quite a few others, down closer to your 5k suggestion.
Back to top
All times are GMT - 8 Hours