A Tcl script can use a network socket just like an open file or pipeline. Instead of using the Tcl open command, you use the socket command to open a socket. Then you use gets, puts, and read to transfer data. The close command closes a network socket.
Network programming distinguishes between clients and servers. A server is a process or program that runs for long periods of time and controls access to some resource. For example, an FTP server governs access to files, and an HTTP server provides access to hypertext pages on the World Wide Web. A client typically connects to the server for a limited time in order to gain access to the resource. For example, when a Web browser fetches a hypertext page, it is acting as a client. The extended examples in this chapter show how to program the client side of the HTTP protocol. A server for HTTP is illustrated in Chapter X.
set s [socket www.sun.com 80]There are two forms for host names. The previous example uses a domain name: www.sun.com. You can also specify raw IP addresses, which are specified with 4 dot-separated integers (.e.g, 128.15.115.32). A domain name is mapped into a raw IP address by the system software, and it is almost always a better idea to use a domain name in case the IP address assignment for the host changes. This can happen when hosts are upgraded or they move to a different part of the network. As of Tcl 7.6, there is no direct access from Tcl to the DNS service that maps host names to IP addresses.
Some systems also provide symbolic names for well known port numbers, too. For example, instead of using 20 for the FTP service, you can use ftp. On UNIX systems the well known port numbers are listed in the file named /etc/services.
socket ?-async? ?-myaddr address? ?-myport myport? host portOrdinarily the address and port on the client side are chosen automatically. If your computer has multiple network interfaces you can select one with the -myaddr option. The address value can be a domain name or an IP address. If your application needs a specific client port, it can choose one with the -myport option. If the port is in use, the socket command will raise an error.
In some cases it can take a long time to open the connection to the server. The -async option causes connection to happen in the background, and the socket command returns immediately. The socket becomes writeable when the connection completes, or fails. You can use fileevent to get a callback when this occurs. If you use the socket before the connection completes, and the socket is in blocking mode, then Tcl automatically blocks and waits for the connection to complete. If the socket is in non-blocking mode, attempts to use the socket return immediatly. The gets and read commands would return -1, and fblocked would return 1 in this situation. The following trivial example illustrates -async. The only advantage of this approach is that the Tcl event loop is active while your application waits for the connection:
set sock [socket -async host port]
fileevent $sock w {set connected 1}
global connected
vwait connected
set mainSocket [socket -server Accept 2540]
proc Accept {newSock addr port} {
puts "Accepted $newSock from $addr port $port"
}
vwait foreverThis example creates a server socket and specifies the Accept command as the server callback. In this simple example, Accept just prints out its arguments. The last argument to the socket command is the server's port number. For your own unofficial servers, you'll need to pick port numbers higher than 10241 to avoid conflicts with existing services.
The vwait command puts Tcl into its event loop so it can do the background processing necessary to accept connections. The vwait command will wait until the forever variable is modified, which won't happen in this simple example. The key point is that Tcl processes other events (e.g., network connections and other file I/O) while it waits. If you have a Tk application (e.g., wish), then it already has an event loop to handle window system events, so you do not need to use vwait. The Tcl event loop is discussed on page 155
Server Socket Options
By default, Tcl lets the operating system choose the network interface used for the server socket, and you just supply the port number. If your computer has multiple interfaces you may want to specify a particular one. Use the -myaddr option for this. The general form of the command to open server sockets is:
socket -server callback ?-myaddr address? port
proc Echo_Server {port} {
global echo
set echo(main) [socket -server EchoAccept $port]
}
proc EchoAccept {sock addr port} {
global echo
puts "Accept $newSock from $addr port $port"
set echo(addr,$sock) [list $addr $port]
fconfigure $sock -buffering line
fileevent $sock readable [list Echo $sock]
}
proc Echo {sock} {
global echo
if {[eof $sock] || [catch {gets $sock line}]} {
# end-of-file or abnormal connection drop
close $sock
puts "Close $echo(addr,$sock)"
unset echo(addr,$sock)
} else {
if {[string compare $line "quit"] == 0} {
# Prevent new connections.
# Existing connections stay open.
close $echo(main)
}
puts $sock $line
}
}The Echo_Server procedure opens the socket and saves the result in echo(main). When this socket is closed later, the server stops accepting new connections but existing connections won't be affected. If you want to experiment with this server, start it and wait for connections like this:
Echo_Server 2540
vwait foreverThe EchoAccept procedure uses the fconfigure command to set up line buffering. This means that each puts by the server results in a network transmission to the client. The importance of this will be described in more detail later. A complete description of the fconfigure command is given on Page 158. The EchoAccept procedure uses the fileevent command to register a procedure that handles I/O on the socket. In this example, the Echo procedure will be called whenever the socket is readable. Note that it is not necessary to put the socket into non-blocking mode when using the fileevent callback. The effects of non-blocking mode are discussed on page 159.
EchoAccept saves information about each client in the echo array. This is just used to print out a message when a client closes its connection. It a more sophisticated server, however, you may need to keep more interesting state about each client. The name of the socket provides a convenient handle on the client. In this case it is used as part of the array index.
if {[eof $sock] || [catch {gets $sock line}]} {Closing the socket automatically clears the fileevent registration.
In the normal case the server simply reads a line with gets and then writes it back to the client with puts. If the line is "quit", then the server closes its main socket. This prevents any more connections by new clients, but it doesn't affect any clients that are already connected.
proc Echo_Client {host port} {
set s [socket $host $port]
fconfigure $s -buffering line
return $s
}
set s [Echo_Client localhost 2540]
puts $s "Hello!"
gets $s=> Hello!
Example 15-3 shows a sample client of the Echo service. The main point is to ensure the socket is line buffered so that each puts by the client results in a network transmission. (Or, more precisely, each newline character results in a network transmission.) If you forget to set line buffering with fconfigure, the client's gets command will probably hang because the server will not get any data; it will be stuck in buffers on the client.
Fetching a URL with HTTP
The HyperText Transport Protocol (HTTP) is the protocol used on the World Wide Web. This section presents a procedure to fetch pages or images from a server on the Web. Items in the Web are identified with a Universal Resource Location (URL) that specifies a host, port, and location on the host. The basic outline of HTTP is that a client sends a URL to a server, and the server responds with some header information and some content data. The header information describes the content, which can be hypertext, images, postscript, and more.
proc Http_Open {url} {
global http
if {![regexp -nocase {^(http://)?([^:/]+)(:([0-9])+)?(/ .*)} \
$url x protocol server y port path]} {
error "bogus URL: $url"
}
if {[string length $port] == 0} {
set port 80
}
set sock [socket $server $port]
puts $sock "GET $path HTTP/1.0"
puts $sock "Host: $server"
puts $sock "User-Agent: Tcl/Tk Http_Open"
puts $sock ""
flush $sock
return $sock
}The Http_Open procedure uses regexp to pick out the server and port from the URL. This regular expression is described in detail on Page 119. The leading http:// is optional, and so is the port number. If the port is left off, then the standard port 80 is used. If the regular expression matches, then a socket command opens the network connection.
key: valueThe Host identifies the server, which supports servers that implement more than one server name. The User-Agent identifies the client program, which is often a browser like Netscape or Internet Explorer. The key-value lines are terminated with a blank line. This data is flushed out of the Tcl buffering system with the flush command. The server will respond by sending the URL contents back over the socket. This is described shortly, but first we consider proxies.
# Http_Proxy sets or queries the proxy
proc Http_Proxy {{new {}}} {
global http
if {[string length $new] == 0} {
return $http(proxy):$http(proxyPort)
} else {
regexp {^([^:]+):([0-9]+)$} $new x \
http(proxy) http(proxyPort)
}
}
proc Http_Open {url {command GET} {query {}}} {
global http
if {![regexp -nocase {^(http://)?([^:/]+)(:([0-9])+)?(/ .*)} \
$url x protocol server y port path]} {
error "bogus URL: $url"
}
if {[string length $port] == 0} {
set port 80
}
if {[catch {
# Open, write, and flush to trigger errors with
# unreachable hosts.
set sock [socket $server $port]
puts $sock "$command $path HTTP/1.0"
flush $sock
} err] {
# Unreachable server. Connect to the proxy server instead.
if ![info exists http(proxy)] {
return -code error $err
}
set sock [socket $http(proxy) $http(proxyPort)]
puts $sock "$command http://$server:$port$path HTTP/ 1.0"
}
puts $sock "User-Agent: Tcl/Tk Http_Open"
puts $sock "Host: $server"
puts $sock ""
if {[string length $query] > 0} {
puts $sock $query
}
flush $sock
return $sock
}
Example 15-6 implements the HEAD command, which does not involve any reply data:
proc Http_Head {url} {
upvar #0 $url state
set state(sock) [Http_Open $url HEAD]
set state(http) "500 unknown error"
fileevent $state(sock) readable [list HttpHeader $url]
# Specify the real name, not the upvar alias, to vwait
vwait $url\(status)
catch {close $state(sock)}
return $state(http)
}
proc HttpHeader {url} {
upvar #0 $url state
if [eof $state(sock)] {
set state(status) eof
close $state(sock)
return
}
if [catch {gets $state(sock) line} nbytes] {
set state(status) error
lappend state(headers) [list error $nbytes]
close $state(sock)
return
}
if {$nbytes < 0} {
# Read would block
return
} elseif {$nbytes == 0} {
# Header complete
set state(status) head
} elseif {![info exists state(headers)]} {
# Initial status reply from the server
set state(headers) [list http $line]
} else {
# Process key-value pairs
regexp {^([^:]+):(.*)$} $line x key value
lappend state(headers) [string tolower $key] $value
}
}The Http_Head procedure uses Http_Open to contact the server. The HttpHeader procedure is registered as a fileevent handler to read the server's reply. A global array keeps state about each operation. The URL is used in the array name, and upvar is used to create an alias to the name (upvar is described on page 78):
upvar #0 $url stateYou cannot use the upvar alias as the variable specified to vwait. Instead, you must use the actual name. The backslash turns off the array reference in order to pass the name of the array element to vwait, otherwise Tcl tries to reference url as an array:
vwait $url\(status)The HttpHeader procedure checks for special cases: end-of-file, an error on the gets, or a short read on a non-blocking socket. The header lines are put onto a list, state(headers), where the list values alternate between keys and values. The very first reply line contains a status code from the server that is in a different format than the rest of the header lines:
code messageThe code is a 3 digit numeric code. 200 is OK. Codes in the 400's and 500's indicate an error. The codes are explained fully in RFC XXX that specifies HTTP 1.0. The first line is saved with the key http:
set state(headers) [list http $line]The rest of the header lines are parsed into key-value pairs and appended onto state(headers). This format can be used to initialize an array:
array set header $state(headers)When HttpHeader gets an empty line the header is complete and it sets the state(status) variable, which signals Http_Head. Finally, Http_Head returns the HTTP status line to its caller. The complete information about the request is still in the global array named by the URL. Example 15-7 illustrates the use of Http_Head:
set url http://www.sun.com/
set status [Http_Head $url]
=> 200 OK
upvar #0 $url state
array set info $state(headers)
parray info
info(http) 200 OK
info(server) Apache ...
info(last-modified) Nov ...info(content-type) text/html
proc Http_Get {url {query {}}} {
upvar #0 $url state ;# Alias to global array
upvar #0 $url\(sock) sock ;# Alias to array element
catch {unset state} ;# Reset state. Aliases remain.
if {[string length $query] > 0} {
set sock [Http_Open $url POST $query]
} else {
set sock [Http_Open $url GET]
}
fileevent $sock readable [list HttpHeader $url]
# Specify the real name, not the upvar alias, to vwait
vwait $url\(status)
set header(content-type) {}
set header(http) "500 unknown error"
array set header $state(headers)
# Check return status.
# 200 is OK, other codes indicate a problem.
if {![string match 2* $header(http)]} {
catch {close $sock}
if {[info exists header(location)] &&
[string match 3* $header(http)]} {
# 3xx is a redirection to another URL
set state(link) $header(location)
return [Http_Get $header(location)]
}
return -code error $header(http)
}
# Set up to read the content data
switch -glob -- $header(content-type) {
text/* {
# Read HTML into memory
fconfigure $sock -blocking off
fileevent $sock readable [list HttpGetText $url]
}
default {
# Copy content data to a file
fconfigure $sock -translation binary -blocking off
set state(filename) [File_TempName http]
if [catch {open $state(filename) w} out] {
set state(status) error
set state(error) $out
close $sock
return $header(content-type)
}
set state(fd) $out
fileevent $sock readable [list HttpCopyData $url]
}
}
vwait $url\(status)
return $header(content-type)
}Http_Get uses Http_Open to initiate the request, and then it looks for errors. It handles redirection errors that occur if a URL has changed. These have error codes that begin with 3. A common case of this is when a user omits the trailing slash on a URL (e.g. http://www.sun.com). Most servers respond with:
302 Document has moved
Location: http://www.sun.com/If the content-type is text, then Http_Get sets up a fileevent handler to read this data into memory. The socket is put into non-blocking mode so the read handler can read as much data as possible each time it is called. This is more efficient than using gets to read a line at a time. The text will be stored in the state(text) variable for use by the caller of Http_Get. Example 15-9 shows the HttpGetText fileevent handler:
proc HttpGetText {url} {
upvar #0 $url state
if [eof $state(sock)] {
# Content complete
set state(status) done
close $state(sock)
} elseif {[catch {read $state(sock)} block]} {
set state(status) error
lappend state(headers) [list error $block]
close $state(sock)
} else {
append state(text) $block
}
}The content may be in binary format. This poses a problem for Tcl 7.6 and earlier: A null character will terminate the value, so values with embedded nulls cannot be stored safely in Tcl variables. Tcl 8.0 supports strings and variable values with arbitrary binary data. The Http_Get procedure copies non-text content data to a temporary file. The HttpCopyData procedure shown in Example 15-10 uses an undocumented Tcl command, unsupported0, that copies data from one I/O channel to another without storing it in Tcl variables.
rename unsupported0 copychannel
proc HttpCopyData {url} {
upvar #0 $url state
if [eof $state(sock)] {
# Content complete
set state(status) done
close $state(sock)
close $state(fd)
} elseif {[catch {copychannel $state(sock) $state(fd)} x]} {
set state(status) error
lappend state(headers) [list error $x]
close $state(sock)
close $state(fd)
}
}The user of Http_Get uses the information in the state array to determine the status of the fetch and where to find the content. There are four cases to deal with:
upvar #0 $state(link) state
unsupported0 input output ?chunksize?The command reads from theinput channel and writes to the output channel. The number of bytes transferred is returned. If chunksize is specified, then at most this many bytes are read from input. If input is in blocking mode, then unsupported0 will block until chunksize bytes are read, or until end of file. If input is non-blocking, all available data from input is read, up to chunksize bytes, and copied to output. If output is non-blocking, then unsupported0 queues all the data read from input and returns. Otherwise, unsupported0 could block when writing to output.
The name is easy to fix:
if {[info commands unsupported0] == "unsupported0"} {
rename unsupported0 copychannel
}
welch@acm.org Copyright © 1996, Brent Welch. All rights reserved.