[Sockets] Web Programming with Sockets


  Have you ever dreamed of writing your own Apache web-server or Netscape browser? Writing the full application would be a huge undertaking, but programming a basic web-server or web-browser is not that difficult. It all starts with some basic socket programming in the good tradition of Stevens' classic book "Unix Network Programming". A web-server can be written in a couple of hundred lines of code, while a program that can fetch and store web-pages is of the same complexity. Of course doing all the graphics for rendering and displaying those pages would be way more work.

I have written this type of programs for a couple of reasons: it's a good learning experience, these programs often come in very handy as test-tools and thirdly, because they are easy customized, you can create programs that use the standard Internet protocols, but that provide functionality outside of the scope of normal web-servers. As an example, with this code as a starting point it was easy to write a web-robot that can download a whole web-site.

The software comes in two very different tastes and for each I have provided firstly a simple reader that gets html pages (or images) from a web-server and secondly a basic web-server that, provided with a directory tree with content, can answer and fullfill requests from web-browsers. The two "tastes" are Java code plus classes, which will run on any platform with a Java Virtual Machine and on the other hand two Microsoft Visual C 6.0 projects and .exe's that will run on the various Win platforms.


[wsrv]

Visual C - wsrv

The starting point for the code in this program, was the book by Alok Kumar Sinha "Network Programming in Windows NT". When you compare the code in his book with the code in Stevens' book, you see that most Unix Socket calls have been implemented and are part of the winsock / wsock32 libraries. With the result that a web-server application in Windows can be nearly the same as one for Unix, with only some extra initialization and a bit different error handling.

My "wsrv" program comes in form of a VC6 project. So double-click on wsrv.dsw to open the project, change the code to your wishes and recompile and test.

In its current form (and you can use the wsrv.exe program as such) you can start the program in an MS-DOS box on a Win platform by providing the root-directory of your web-site content. It will listen on port 80, but you can change that with the -p command line parameter. For the rest "wsrv -?" shows you the way.

When you request this server with a URL ending with a /, it will look for a file called index.html in that directory and when existent reply with that one. When not found, it will not create a directory listing, but will return a 404 file-not-found error-page.


[wget]

Visual C - wget

As expected, wget is the counterpart of wsrv and it shares much of the same code. With "wget.exe http://a-domain/a-path/a-file.html" you can download a web-page from a site and store it in a file. When you want to use it in combination with wsrv, you must be careful with which "a-domain" you can use. Start trying "localhost" or try the real name of your system. It is also a good idea to put some sensible stuff in C:\Windows\hosts.

The program supports the use of ports different from 80 in the URL. Also there is a parameter -h that controls if you want to store the http-headers in the file. The way this works is that when you are not interested in http-headers, you request a page with "GET /a-path/a-file.html", while when you are interested in the headers, you use the newer "GET /a-path/a-file.html HTTP/1.0" command. Finally, the program will default store the web-page in a file with the same name, but you can influence this with the -f parameter.

I recently discovered that many Windows systems (at least Win95/98) have an incomplete "Services" file. Look for it in C:\Windows\ or one of its subdirectories. In NT4 Services resides in "...\drivers\etc". What is missing is an entry for http, therefore after the entry for finger add a line with "http 80/tcp". Both wsrv and wget rely on this entry.


[WServer]

Java - WServer

To write this type of programs in Java is way simpler and quicker than doing it in C. Reason is that many Java classes are already Internet protocal aware. So opening a socket connection with a server on the net is in Java a jiffy. This web-server still rather basic but on the other hand fully functional. A major benefit of this server program over the VC6 code is that it is multi-threaded. For each client a new thread is created to handle that request independent and in parallel to other reqests. Again, in Java that is much easier to implement than in C.

You start the server from the command line with "java WServer . It will run on any platform that has a Java Virtual Machine installed. Sometimes, like when using jre to run the program, you must specify a CLASSPATH to the directory that contains your application class-files.


[WClient]

Java - WClient

On the client side WClient retrieves a web-page from the net. And that all in a good 50 lines of Java code. After the file is retrieved, you have the choice to either just display the html-code or to store it in a file. Based on code like this I wrote the before mentioned web-robot. It just involves parsing the html-code and than recursively retrieve and store more html-files.

Have fun !!


[PNG page] [Download page]


  Willem van Schaik, Calgary, April 2001     http://www.schaik.com/wwwillem.html