[Top] [Prev] [Next] [Bottom]

15 Socket Programming

This chapter shows how to use sockets for programming network clients and servers. Advanced I/O techniques for sockets are described, including non-blocking I/O and control over I/O buffering. Tcl commands: socket.

Sockets are network communication channels. The sockets described in this chapter use the TCP network protocol, although you can find Tcl extensions that create sockets using other protocols. TCP provides a reliable byte stream between two hosts connected to a network. TCP handles all the issues about routing information across the network, and it automatically recovers if data is lost or corrupted along the way. TCP is the basis for other protocols like Telnet, FTP, and HTTP.

A Tcl script can use a network socket just like an open file or pipeline. Instead of using the Tcl open command, you use the socket command to open a socket. Then you use gets, puts, and read to transfer data. The close command closes a network socket.

Network programming distinguishes between clients and servers. A server is a process or program that runs for long periods of time and controls access to some resource. For example, an FTP server governs access to files, and an HTTP server provides access to hypertext pages on the World Wide Web. A client typically connects to the server for a limited time in order to gain access to the resource. For example, when a Web browser fetches a hypertext page, it is acting as a client. The extended examples in this chapter show how to program the client side of the HTTP protocol. A server for HTTP is illustrated in Chapter X.

Client Sockets

A client opens a socket by specifying the host address and port number for the server of the socket. The host address gives the network location (i.e., which computer) and the port selects a particular server from all the possible servers that may be running on that host. For example, HTTP servers typically use port 80, while FTP servers use port 20. The following example shows how to open a client socket to a web server:

set s [socket www.sun.com 80]

There are two forms for host names. The previous example uses a domain name: www.sun.com. You can also specify raw IP addresses, which are specified with 4 dot-separated integers (.e.g, 128.15.115.32). A domain name is mapped into a raw IP address by the system software, and it is almost always a better idea to use a domain name in case the IP address assignment for the host changes. This can happen when hosts are upgraded or they move to a different part of the network. As of Tcl 7.6, there is no direct access from Tcl to the DNS service that maps host names to IP addresses.

Some systems also provide symbolic names for well known port numbers, too. For example, instead of using 20 for the FTP service, you can use ftp. On UNIX systems the well known port numbers are listed in the file named /etc/services.

Client Socket Options

The socket command accepts some optional arguments when opening the client side socket. The general form of the command is:

socket ?-async? ?-myaddr address? ?-myport myport? host 
port

Ordinarily the address and port on the client side are chosen automatically. If your computer has multiple network interfaces you can select one with the -myaddr option. The address value can be a domain name or an IP address. If your application needs a specific client port, it can choose one with the -myport option. If the port is in use, the socket command will raise an error.

In some cases it can take a long time to open the connection to the server. The -async option causes connection to happen in the background, and the socket command returns immediately. The socket becomes writeable when the connection completes, or fails. You can use fileevent to get a callback when this occurs. If you use the socket before the connection completes, and the socket is in blocking mode, then Tcl automatically blocks and waits for the connection to complete. If the socket is in non-blocking mode, attempts to use the socket return immediatly. The gets and read commands would return -1, and fblocked would return 1 in this situation. The following trivial example illustrates -async. The only advantage of this approach is that the Tcl event loop is active while your application waits for the connection:

set sock [socket -async host port]

fileevent $sock w {set connected 1}

global connected

vwait connected

Server Sockets

A server socket is a little more complex because it has to allow for multiple clients. The way this works is that the socket command creates a listening socket, and then new sockets are created when clients make connections to the server. Tcl takes care of all the details and makes this easy to use. You give the socket command a callback to execute when a client connects to your server socket. The callback is just a Tcl command. It gets as arguments the new socket, and the address and port number of the connecting client. A simple example is shown below:

Opening a server socket.

set mainSocket [socket -server Accept 2540]

proc Accept {newSock addr port} {

	puts "Accepted $newSock from $addr port $port"

vwait forever

This example creates a server socket and specifies the Accept command as the server callback. In this simple example, Accept just prints out its arguments. The last argument to the socket command is the server's port number. For your own unofficial servers, you'll need to pick port numbers higher than 1024 ¹ to avoid conflicts with existing services.

The vwait command puts Tcl into its event loop so it can do the background processing necessary to accept connections. The vwait command will wait until the forever variable is modified, which won't happen in this simple example. The key point is that Tcl processes other events (e.g., network connections and other file I/O) while it waits. If you have a Tk application (e.g., wish), then it already has an event loop to handle window system events, so you do not need to use vwait. The Tcl event loop is discussed on page 155

Server Socket Options

By default, Tcl lets the operating system choose the network interface used for the server socket, and you just supply the port number. If your computer has multiple interfaces you may want to specify a particular one. Use the -myaddr option for this. The general form of the command to open server sockets is:

socket -server callback ?-myaddr address? port

The Echo Service

This section presents a simple echo server. The echo server accepts connections from clients. It reads data from the clients and writes that data back. The example uses fileevent to wait for data from the client, and it uses fconfigure to adjust the buffering behavior of the network socket. You can use this example as a template for more interesting services.

The Echo Server.

proc Echo_Server {port} {

	global echo

	set echo(main) [socket -server EchoAccept $port]

proc EchoAccept {sock addr port} {

	global echo

	puts "Accept $newSock from $addr port $port"

	set echo(addr,$sock) [list $addr $port]

	fconfigure $sock -buffering line

	fileevent $sock readable [list Echo $sock]

proc Echo {sock} {

	global echo

	if {[eof $sock] || [catch {gets $sock line}]} {

		# end-of-file or abnormal connection drop

		close $sock

		puts "Close $echo(addr,$sock)"

		unset echo(addr,$sock)

	} else {

		if {[string compare $line "quit"] == 0} {

			# Prevent new connections.

			# Existing connections stay open.

			close $echo(main)

		puts $sock $line

The Echo_Server procedure opens the socket and saves the result in echo(main). When this socket is closed later, the server stops accepting new connections but existing connections won't be affected. If you want to experiment with this server, start it and wait for connections like this:

Echo_Server 2540

vwait forever

The EchoAccept procedure uses the fconfigure command to set up line buffering. This means that each puts by the server results in a network transmission to the client. The importance of this will be described in more detail later. A complete description of the fconfigure command is given on Page 158. The EchoAccept procedure uses the fileevent command to register a procedure that handles I/O on the socket. In this example, the Echo procedure will be called whenever the socket is readable. Note that it is not necessary to put the socket into non-blocking mode when using the fileevent callback. The effects of non-blocking mode are discussed on page 159.

EchoAccept saves information about each client in the echo array. This is just used to print out a message when a client closes its connection. It a more sophisticated server, however, you may need to keep more interesting state about each client. The name of the socket provides a convenient handle on the client. In this case it is used as part of the array index.

The Echo procedure first checks to see if the socket has been closed by the client or there is an error when reading the socket. The if expression only does the gets if the eof does not return true:

if {[eof $sock] || [catch {gets $sock line}]} {

Closing the socket automatically clears the fileevent registration.

If you forget to close the socket upon the end-of-file condition, the Tcl event loop will invoke your callback repeatedly. It is important to close it when you detect end of file.

In the normal case the server simply reads a line with gets and then writes it back to the client with puts. If the line is "quit", then the server closes its main socket. This prevents any more connections by new clients, but it doesn't affect any clients that are already connected.

A client of the Echo Service.

proc Echo_Client {host port} {

	set s [socket $host $port]

	fconfigure $s -buffering line

	return $s

set s [Echo_Client localhost 2540]

puts $s "Hello!"

gets $s

=> Hello!

Example 15-3 shows a sample client of the Echo service. The main point is to ensure the socket is line buffered so that each puts by the client results in a network transmission. (Or, more precisely, each newline character results in a network transmission.) If you forget to set line buffering with fconfigure, the client's gets command will probably hang because the server will not get any data; it will be stuck in buffers on the client.

Fetching a URL with HTTP

The HyperText Transport Protocol (HTTP) is the protocol used on the World Wide Web. This section presents a procedure to fetch pages or images from a server on the Web. Items in the Web are identified with a Universal Resource Location (URL) that specifies a host, port, and location on the host. The basic outline of HTTP is that a client sends a URL to a server, and the server responds with some header information and some content data. The header information describes the content, which can be hypertext, images, postscript, and more.

Opening a connection to an HTTP server.

proc Http_Open {url} {

	global http

	if {![regexp -nocase {^(http://)?([^:/]+)(:([0-9])+)?(/
.*)} \

			$url x protocol server y port path]} {

		error "bogus URL: $url"

	if {[string length $port] == 0} {

		set port 80

	set sock [socket $server $port]

	puts $sock "GET $path HTTP/1.0"

	puts $sock "Host: $server"

	puts $sock "User-Agent: Tcl/Tk Http_Open"

	puts $sock ""

	flush $sock

	return $sock

The Http_Open procedure uses regexp to pick out the server and port from the URL. This regular expression is described in detail on Page 119. The leading http:// is optional, and so is the port number. If the port is left off, then the standard port 80 is used. If the regular expression matches, then a socket command opens the network connection.

The protocol begins with the client sending a line that identifies the command (GET), the path, and the protocol version. The path is the part of the URL after the server and port specification. The rest of the request is lines in the following format:

key: value

The Host identifies the server, which supports servers that implement more than one server name. The User-Agent identifies the client program, which is often a browser like Netscape or Internet Explorer. The key-value lines are terminated with a blank line. This data is flushed out of the Tcl buffering system with the flush command. The server will respond by sending the URL contents back over the socket. This is described shortly, but first we consider proxies.

Proxy Servers

A proxy is used to get through firewalls that many organizations set up to isolate their network from the main Internet. The proxy accepts HTTP requests from clients inside the firewall, and then forwards the requests outside the firewall. It also relays the server's response back to the client. The protocol is nearly the same when using the proxy. The difference is that the complete URL is passed to the GET command so the proxy can locate the server. Example 15-5 generalizes Http_Open to work with proxies. It tries to connect directly to the server. On some systems it is necessary to attempt to write data to a socket before an error from an unreachable host is raised. If an error is raised, then the proxy is used:

Opening a connection to an HTTP server.

# Http_Proxy sets or queries the proxy

proc Http_Proxy {{new {}}} {

	global http

	if {[string length $new] == 0} {

		return $http(proxy):$http(proxyPort)

	} else {

		regexp {^([^:]+):([0-9]+)$} $new x \

			http(proxy) http(proxyPort)

proc Http_Open {url {command GET} {query {}}} {

	global http

	if {![regexp -nocase {^(http://)?([^:/]+)(:([0-9])+)?(/
.*)} \

			$url x protocol server y port path]} {

		error "bogus URL: $url"

	if {[string length $port] == 0} {

		set port 80

	if {[catch {

		# Open, write, and flush to trigger errors with

		# unreachable hosts.

		set sock [socket $server $port]

		puts $sock "$command $path HTTP/1.0"

		flush $sock

	} err] {

		# Unreachable server. Connect to the proxy server 
instead.

		if ![info exists http(proxy)] {

			return -code error $err

		set sock [socket $http(proxy) $http(proxyPort)]

		puts $sock "$command http://$server:$port$path HTTP/
1.0"

	puts $sock "User-Agent: Tcl/Tk Http_Open"

	puts $sock "Host: $server"

	puts $sock ""

	if {[string length $query] > 0} {

		puts $sock $query

	flush $sock

	return $sock

The HEAD Request

Example 15-5 parameterizes the HTTP protocol so the user of Http_Open can perform different operations. The GET operation fetches the contents of a URL. The HEAD operation just fetches the description of a URL, which is useful to validate a URL. The POST operation transmits query data to the server (e.g., values from a form), and also fetches the contents of the URL. All of these operations follow a similar protocol. The reply from the server is a status line followed by lines that have key-value pairs. This format is similar to the client's request. The reply header is followed by content data with GET and POST operations.

Example 15-6 implements the HEAD command, which does not involve any reply data:

Http_Head validates a URL.

proc Http_Head {url} {

	upvar #0 $url state

	set state(sock) [Http_Open $url HEAD]

	set state(http) "500 unknown error"

	fileevent $state(sock) readable [list HttpHeader $url]

	# Specify the real name, not the upvar alias, to vwait

	vwait $url\(status)

	catch {close $state(sock)}

	return $state(http)

proc HttpHeader {url} {

	upvar #0 $url state

	if [eof $state(sock)] {

		set state(status) eof

		close $state(sock)

		return

	if [catch {gets $state(sock) line} nbytes] {

		set state(status) error

		lappend state(headers) [list error $nbytes]

		close $state(sock)

		return

	if {$nbytes < 0} {

		# Read would block

		return

	} elseif {$nbytes == 0} {

		# Header complete

		set state(status) head

	} elseif {![info exists state(headers)]} {

		# Initial status reply from the server

		set state(headers) [list http $line]

	} else {

		# Process key-value pairs

		regexp {^([^:]+):(.*)$} $line x key value

		lappend state(headers) [string tolower $key] $value

The Http_Head procedure uses Http_Open to contact the server. The HttpHeader procedure is registered as a fileevent handler to read the server's reply. A global array keeps state about each operation. The URL is used in the array name, and upvar is used to create an alias to the name (upvar is described on page 78):

upvar #0 $url state

You cannot use the upvar alias as the variable specified to vwait. Instead, you must use the actual name. The backslash turns off the array reference in order to pass the name of the array element to vwait, otherwise Tcl tries to reference url as an array:

vwait $url\(status)

The HttpHeader procedure checks for special cases: end-of-file, an error on the gets, or a short read on a non-blocking socket. The header lines are put onto a list, state(headers), where the list values alternate between keys and values. The very first reply line contains a status code from the server that is in a different format than the rest of the header lines:

code message

The code is a 3 digit numeric code. 200 is OK. Codes in the 400's and 500's indicate an error. The codes are explained fully in RFC XXX that specifies HTTP 1.0. The first line is saved with the key http:

set state(headers) [list http $line]

The rest of the header lines are parsed into key-value pairs and appended onto state(headers). This format can be used to initialize an array:

array set header $state(headers)

When HttpHeader gets an empty line the header is complete and it sets the state(status) variable, which signals Http_Head. Finally, Http_Head returns the HTTP status line to its caller. The complete information about the request is still in the global array named by the URL. Example 15-7 illustrates the use of Http_Head:

Using Http_Head.

set url http://www.sun.com/

set status [Http_Head $url]

=> 200 OK

upvar #0 $url state

array set info $state(headers)

parray info

info(http)						200 OK

info(server)						Apache ...

info(last-modified)						Nov ...

info(content-type) text/html

The GET and POST Requests

Example 15-8 shows Http_Get that implements the GET and POST requests. The difference between these is that POST sends query data to the server after the request header. Both operations get a reply from the server that is divided into a descriptive header and the content data. The Http_Open procedure sends the request and the query, if present, and reads the reply header. Http_Get reads the content.

The descriptive header returned by the server is in the same format as the client's request. One of of the key-value pairs returned by the server specifies the Content-Type of the URL. The content-types come from the MIME standard, which is described in RFC 1521 (???). Typical content-types are:

text/html - HyperText Markup Language (HTML), which is described on page 30.
text/plain - plain text with no markup.
image/gif - image data in GIF format.
image/jpeg -image data in JPEG format.
application/postscript - a postscript document.
application/x-tcl - a Tcl program! This type discussed in detail Chapter 19.
.Http_Get fetches the contents of a URL.

proc Http_Get {url {query {}}} {

	upvar #0 $url state								;# Alias to global array

	upvar #0 $url\(sock) sock								;# Alias to array element

	catch {unset state}								;# Reset state. Aliases 
remain.

	if {[string length $query] > 0} {

		set sock [Http_Open $url POST $query]

	} else {

		set sock [Http_Open $url GET]

	fileevent $sock readable [list HttpHeader $url]

	# Specify the real name, not the upvar alias, to vwait

	vwait $url\(status)

	set header(content-type) {}

	set header(http) "500 unknown error"

	array set header $state(headers)

	# Check return status.

	# 200 is OK, other codes indicate a problem.

	if {![string match 2* $header(http)]} {

		catch {close $sock}

		if {[info exists header(location)] &&

				[string match 3* $header(http)]} {

			# 3xx is a redirection to another URL

			set state(link) $header(location)

			return [Http_Get $header(location)]

		return -code error $header(http)

	# Set up to read the content data

	switch -glob -- $header(content-type) {

		text/*			{

			# Read HTML into memory

			fconfigure $sock -blocking off

			fileevent $sock readable [list HttpGetText $url]

		default			{

			# Copy content data to a file

			fconfigure $sock -translation binary -blocking off

			set state(filename) [File_TempName http]

			if [catch {open $state(filename) w} out] {

				set state(status) error

				set state(error) $out

				close $sock

				return $header(content-type)

			set state(fd) $out

			fileevent $sock readable [list HttpCopyData $url]

	vwait $url\(status)

	return $header(content-type)

Http_Get uses Http_Open to initiate the request, and then it looks for errors. It handles redirection errors that occur if a URL has changed. These have error codes that begin with 3. A common case of this is when a user omits the trailing slash on a URL (e.g. http://www.sun.com). Most servers respond with:

302 Document has moved

Location: http://www.sun.com/

If the content-type is text, then Http_Get sets up a fileevent handler to read this data into memory. The socket is put into non-blocking mode so the read handler can read as much data as possible each time it is called. This is more efficient than using gets to read a line at a time. The text will be stored in the state(text) variable for use by the caller of Http_Get. Example 15-9 shows the HttpGetText fileevent handler:

HttpGetText reads text Content-Type URLs.

proc HttpGetText {url} {

	upvar #0 $url state

	if [eof $state(sock)] {

		# Content complete

		set state(status) done

		close $state(sock)

	} elseif {[catch {read $state(sock)} block]} {

		set state(status) error

		lappend state(headers) [list error $block]

		close $state(sock)

	} else {

		append state(text) $block

The content may be in binary format. This poses a problem for Tcl 7.6 and earlier: A null character will terminate the value, so values with embedded nulls cannot be stored safely in Tcl variables. Tcl 8.0 supports strings and variable values with arbitrary binary data. The Http_Get procedure copies non-text content data to a temporary file. The HttpCopyData procedure shown in Example 15-10 uses an undocumented Tcl command, unsupported0, that copies data from one I/O channel to another without storing it in Tcl variables.

HttpCopyData copies content to a file.

rename unsupported0 copychannel

proc HttpCopyData {url} {

	upvar #0 $url state

	if [eof $state(sock)] {

		# Content complete

		set state(status) done

		close $state(sock)

		close $state(fd)

	} elseif {[catch {copychannel $state(sock) $state(fd)} 
x]} {

		set state(status) error

		lappend state(headers) [list error $x]

		close $state(sock)

		close $state(fd)

The user of Http_Get uses the information in the state array to determine the status of the fetch and where to find the content. There are four cases to deal with:

There was an error, which is indicated by the presense of the state(error) element.
There was a redirection, in which case the new URL is in state(link). The client of Http_Get should change the url and look at its state instead. You can use upvar to redefine the alias for the state array:

upvar #0 $state(link) state

There was text content. The content is in state(text).
There was other content. The content is in a file named by state(filename).

The copychannel Command

Officially, the copychannel command is named unsupported0, and it is not guaranteed to stay around. It was introduced in Tcl 7.5 along with the socket command. It is still present in the alpha version of Tcl 8.0. It may be replaced by a more general binary I/O facility in the future. I insisted on having the copychannel functionality so Tcl scripts could implement HTTP and other protocols that use binary data without needing an extention. The general form of the command is:

unsupported0 input output ?chunksize?

The command reads from theinput channel and writes to the output channel. The number of bytes transferred is returned. If chunksize is specified, then at most this many bytes are read from input. If input is in blocking mode, then unsupported0 will block until chunksize bytes are read, or until end of file. If input is non-blocking, all available data from input is read, up to chunksize bytes, and copied to output. If output is non-blocking, then unsupported0 queues all the data read from input and returns. Otherwise, unsupported0 could block when writing to output.

The name is easy to fix:

if {[info commands unsupported0] == "unsupported0"} {

	rename unsupported0 copychannel

[Top] [Prev] [Next] [Bottom]

¹ UNIX systems prevent user programs from opening server sockets with port numbers less than 1024.

welch@acm.org