% This file is part of the Stanford GraphBase (c) Stanford University 1992
\def\title{GB\_\thinspace MILES}
@i boilerplate.w %<< legal stuff: PLEASE READ IT BEFORE MAKING ANY CHANGES!
\prerequisites{GB\_\thinspace GRAPH}{GB\_\thinspace IO}
@* Introduction. This GraphBase module contains the |miles| subroutine,
which creates a family of undirected graphs based on highway mileage data
between North American cities. Examples of the use of this procedure can be
found in the demo programs |miles_span| and |gb_plane|.
@(gb_miles.h@>=
extern Graph *miles();
@ The subroutine call {\advance\thinmuskip 0mu plus 2mu
`|miles(n,north_weight,west_weight,pop_weight,max_distance,max_degree,seed)|'}
constructs a graph based on the information in \.{miles.dat}.
Each vertex of the graph corresponds to one of the 128 cities whose
name is alphabetically greater than or equal to `Ravenna, Ohio' in
the 1949 edition of Rand McNally {\char`\&} Company's {\sl Standard Highway
Mileage Guide}. Edges between vertices are assigned lengths representing
distances between cities in miles. In most cases these mileages come
from the Rand McNally Guide, but several dozen entries needed to be changed
drastically because they were obviously too large or too small; in such cases
an educated guess was made. Furthermore, about 5\% of the entries were
adjusted slightly in order to
ensure that all distances satisfy the ``triangle inequality'': The
graph generated by |miles| has the property that the
distance from |u| to~|v| plus the distance from |v| to~|w| always exceeds
or equals the distance from |u| to~|w|.
The constructed graph will have $\min(n,128)$ vertices; the default value
|n=128| is substituted if |n=0|. If |n| is less
than 128, the |n| cities will be selected by assigning a weight to
each city and choosing the |n| with largest weight, using random
numbers to break ties in case of equal weights. Weights are computed
by the formula
$$ |north_weight|\cdot|lat|+|west_weight|\cdot|lon|+|pop_weight|\cdot|pop|, $$
where |lat| is latitude north of the equator, |lon| is longitude
west of Greenwich, and |pop| is the population in 1980. Both |lat| and |lon|
are given in ``decidegrees,'' hundredths of degrees. For example,
San Francisco has |lat=3778|, |lon=12242|, and |pop=678974|;
this means that, before the recent earthquake, it was located at
$37.78^\circ$ north latitude and $122.42^\circ$ west longitude, and that it had
678,974 residents in the 1980 census. The weight parameters must satisfy
$$ \vert|north_weight|\vert\le100{,}000,\quad
\vert|west_weight|\vert\le100{,}000,\quad
\vert|pop_weight|\vert\le100.$$
The constructed graph will be ``complete''---that is, it will have
edges between every pair of vertices---unless special values are given to
the parameters
|max_distance| or |max_degree|. If |max_distance!=0|, edges with more
than |max_distance| miles will not appear; if |max_degree!=0|, each
vertex will be limited to at most |max_degree| of its shortest edges.
Vertices of the graph will appear in order of decreasing weight.
The |seed| parameter defines the pseudo-random numbers used wherever
a ``random'' choice between equal-weight vertices or equal-length edges
needs to be made.
@d MAX_N 128
@(gb_miles.h@>=
#define MAX_N 128 /* maximum and default number of cities */
@ Examples: The call |miles(100,0,0,1,0,0,0)| will construct a complete graph on
100 vertices, representing the 100 most populous cities in the database.
It turns out that San Diego, with a population of 875,538, is the winning city
by this criterion, followed by San Antonio (population 786,023),
San Francisco (678,974), and Washington D.C. (638,432).
To get |n| cities in the western United States and Canada, you can say
$|miles|(n,0,1,0,\ldots\,)$; to get |n| cities in the Northeast, use a
call like $|miles|(n,1,-1,0,\ldots\,)$. A parameter setting like
$(50,-500,0,1,\ldots\,)$ produces mostly Southern cities, except for a
few large metropolises in the north.
If you ask for |miles(n,a,b,c,0,1,0)|, you get an edge between cities if
and only if each city is the nearest to the other, among the |n| cities
selected. (The graph is always undirected: There is an arc from |u| to~|v|
if and only if there's an arc of the same length from |v| to~|u|.)
A random selection of cities can be obtained by calling |miles(n,0,0,0,m,d,s)|.
Different choices of the seed number |s| will produce different selections,
in a system-independent manner; identical results will be obtained on
all computers when identical parameters have been specified. Equivalent
experiments on algorithms for graph manipulation can therefore be performed
by researchers in different parts of the world. Any value of |s| between
0 and $2^{31}-1$ is permissible.
@ If the |miles| routine encounters a problem, it returns |NULL|
(\.{NULL}), after putting a code number into the external variable
|panic_code|. This code number identifies the type of failure.
Otherwise |miles| returns a pointer to the newly created graph, which
will be represented with the data structures explained in |gb_graph|.
(The external variable |@!panic_code| is itself defined in |gb_graph|.)
@d panic(c) @+{@+panic_code=c;@+gb_alloc_trouble=0;@+return NULL;@+}
@#
@f Graph int /* |gb_graph| defines the |Graph| type and a few others */
@f Vertex int
@f Arc int
@f Area int
@ The \Cee\ file \.{gb\_miles.c} has the following overall shape:
@p
#include "gb_io.h" /* we will use the |gb_io| routines for input */
#include "gb_flip.h" /* we will use the |gb_flip| routines for random numbers */
#include "gb_graph.h" /* we will use the |gb_graph| data structures */
#include "gb_sort.h" /* and the linksort routine */
@#
@@;
@@;
@#
Graph *miles(n,north_weight,west_weight,pop_weight,
max_distance,max_degree,seed)
unsigned n; /* number of vertices desired */
long north_weight; /* coefficient of latitude in the weight function */
long west_weight; /* coefficient of longitude in the weight function */
int pop_weight; /* coefficient of population in the weight function */
unsigned max_distance; /* maximum distance in an edge, if nonzero */
unsigned max_degree; /* maximum number of edges per vertex, if nonzero */
long seed; /* random number seed */
{@+@@;
gb_init_rand(seed);
@;
@;
@;
@;
@;
if (gb_alloc_trouble) {
gb_recycle(new_graph);
panic(alloc_fault); /* oops, we ran out of memory somewhere back there */
}
return new_graph;
}
@ @=
Graph *new_graph; /* the graph constructed by |miles| */
register int j,k; /* all-purpose indices */
@ @=
if (n==0 || n>MAX_N) n=MAX_N;
if (north_weight>100000 || north_weight<-100000 @|
|| west_weight>100000 || west_weight<-100000 @|
|| pop_weight>100 || pop_weight<-100)
panic(bad_specs); /* the magnitude of at least one weight is too big */
@ @=
new_graph=gb_new_graph(n);
if (new_graph==NULL)
panic(no_room); /* out of memory before we're even started */
sprintf(new_graph->id,"miles(%u,%ld,%ld,%d,%u,%u,%ld)",
n,north_weight,west_weight,pop_weight,max_distance,max_degree,seed);
strcpy(new_graph->format,"ZZIIIIZZZZZZZZ");
@* Vertices. As we read in the data, we construct a list of nodes,
each of which contains a city's name, latitude, longitude, population,
and weight. These nodes conform to the specifications stipulated in
the |gb_sort| module. After the list has been sorted by weight, the
top |n| entries will be the vertices of the new graph.
@=
typedef struct node_struct { /* records to be sorted by |gb_linksort| */
long key; /* the nonnegative sort key (weight plus $2^{30}$) */
struct node_struct *link; /* pointer to next record */
int kk; /* index of city in the original database */
long lat,lon,pop; /* latitude, longitude, population */
char name[30]; /* |"City Name, ST"| */
} node;
@ The constants defined here are taken from the specific data in \.{miles.dat},
because this routine is not intended to be perfectly general.
@=
int min_lat=2672, max_lat=5042, min_lon=7180, max_lon=12312,
min_pop=2521, max_pop=875538; /* tight bounds on data entries */
node *node_block; /* array of nodes holding city info */
int *distance; /* array of distances */
@ The data in \.{miles.dat} appears in 128 groups of lines, one for each
city, in reverse alphabetical order. These groups have the general form
$$\vcenter{\halign{\tt#\hfil\cr
City Name, ST[lat,lon]pop\cr
d1 d2 d3 d4 d5 d6 ... (possibly several lines' worth)\cr
}}$$
where \.{City Name} is the name of the city (possibly including spaces);
\.{ST} is the two-letter state code; \.{lat} and \.{lon} are latitude
and longitude in hundredths of degrees; \.{pop} is the population; and
the remaining numbers \.{d1}, \.{d2}, \dots\ are
distances to the previously named cities in reverse order, each separated
from the previous item by either a blank space or a newline character.
For example, the line
$$\hbox{\tt San Francisco, CA[3778,12242]678974}$$
specifies the data about San Francisco that was mentioned earlier.
From the first few groups
$$\vcenter{\halign{\tt#\hfil\cr
Youngstown, OH[4110,8065]115436\cr
Yankton, SD[4288,9739]12011\cr
966\cr
Yakima, WA[4660,12051]49826\cr
1513 2410\cr
Worcester, MA[4227,7180]161799\cr
2964 1520 604\cr
}}$$
we learn that the distance from Worcester, Massachusetts, to Yakima, Washington,
is 2964 miles; from Worcester to Youngstown it is 604 miles.
The following two-letter ``state codes'' are used for Canadian provinces:
$\.{BC}=\null$British Columbia,
$\.{MB}=\null$Manitoba,
$\.{ON}=\null$Ontario,
$\.{SA}=\null$Saskatchewan. (Please don't ask what code would have been used to
distinguish New Brunswick from Nebraska if the need had arisen.)
@=
node_block=gb_alloc_type(MAX_N,@[node@],new_graph->aux_data);
distance=gb_alloc_type(MAX_N*MAX_N,@[int@],new_graph->aux_data);
if (gb_alloc_trouble) {
gb_free(new_graph->aux_data);
panic(no_room+1); /* no room to copy the data */
}
if (gb_open("miles.dat")!=0)
panic(early_data_fault);
/* couldn't open |"miles.dat"| using GraphBase conventions;
|io_errors| tells why */
for (k=MAX_N-1; k>=0; k--) @;
if (gb_close()!=0)
panic(late_data_fault);
/* something's wrong with |"miles.dat"|; see |io_errors| */
@ The bounds we've imposed on |north_weight|, |west_weight|, and |pop_weight|
guarantee that the key value computed here will be between 0 and~$2^{31}$.
@=
{@+register node *p;
p=node_block+k;
p->kk=k;
if (k) p->link=p-1;
gb_string(p->name,'[');
if (gb_char()!='[') panic(syntax_error); /* out of sync in \.{miles.dat} */
p->lat=gb_number(10);
if (p->latlat>max_lat || gb_char()!=',')
panic(syntax_error+1); /* latitude data was clobbered */
p->lon=gb_number(10);
if (p->lonlon>max_lon || gb_char()!=']')
panic(syntax_error+2); /* longitude data was clobbered */
p->pop=gb_number(10);
if (p->poppop>max_pop)
panic(syntax_error+3); /* population data was clobbered */
p->key=north_weight*(p->lat-min_lat)
+west_weight*(p->lon-min_lon)
+pop_weight*(p->pop-min_pop)+0x40000000;
@;
gb_newline();
}
@ @d d(j,k) *(distance+(MAX_N*j+k))
@=
{@+register int j; /* number of the other city */
for (j=k+1; j=
{@+register node *p; /* the current node being considered */
register Vertex *v=new_graph->vertices; /* the first unfilled vertex */
gb_linksort(node_block+MAX_N-1);
for (j=127; j>=0; j--)
for (p=(node*)gb_sorted[j]; p; p=p->link) {
if (vvertices+n) @kk| to the graph@>@;
else p->pop=0; /* this city is not being used */
}
}
@ Utility fields |x| and |y| for each vertex are set to coordinates that
can be used in geometric computations; these coordinates are obtained by
simple linear transformations of latitude and longitude (not by any
kind of sophisticated polyconic projection). We will have
$$0\le x\le5132, \qquad 0\le y\le 3555.$$
Utility field~|z| is set to the city's index number (0 to 127) in the
original database. Utility field~|w| is set to the city's population.
The coordinates computed here are compatible with those in the \TeX\ file
\.{cities.texmap}. Users may wish to incorporate edited copies of that file
into documents that display results obtained with |miles| graphs.
@.cities.texmap@>
@d x_coord x.i
@d y_coord y.i
@d index_no z.i
@d people w.i
@kk| to the graph@>=
{
v->x_coord=max_lon-p->lon; /* |x| coordinate is complement of longitude */
v->y_coord=p->lat-min_lat;
v->y_coord+=(v->y_coord)>>1; /* |y| coordinate is 1.5 times latitude */
v->index_no=p->kk;
v->people=p->pop;
v->name=gb_save_string(p->name);
v++;
}
@ @(gb_miles.h@>=
#define x_coord @t\quad@> x.i
/* utility field definitions for the header file */
#define y_coord @t\quad@> y.i
#define index_no @t\quad@> z.i
#define people @t\quad@> w.i
@* Arcs. We make the distance negative in the matrix entry for an arc
that is not to be included. Nothing needs to be done in this regard
unless the user has specified a maximum degree or a maximum edge length.
@=
if (max_distance>0 || max_degree>0)
@;
{@+register Vertex *u,*v;
for (u=new_graph->vertices;uvertices+n;u++) {
j=u->z.i;
for (v=u+1;vvertices+n;v++) {
k=v->z.i;
if (d(j,k)>0 && d(k,j)>0)
gb_new_edge(u,v,d(j,k));
}
}
}
@ @=
{@+register node *p;
if (max_degree==0) max_degree=MAX_N;
if (max_distance==0) max_distance=30000;
for (p=node_block; ppop) { /* this city not deleted */
k=p->kk;
@;
}
}
@ Here we reuse the key fields of the nodes, storing complementary distances
there instead of weights; we also let the sorting routine change the
link fields. But the other fields (especially |pop|)
remain unchanged. Yes, the author knows this is a wee bit tricky,
but why not?
@=
{@+register node *q;
register node*s=NULL; /* list of nodes containing edges from city |k| */
for (q=node_block; qpop && q!=p) { /* another city not deleted */
j=d(k,q->kk); /* distance from |p| to |q| */
if (j>max_distance)
d(k,q->kk)=-j;
else {
q->key=max_distance-j;
q->link=s;
s=q;
}
}
gb_linksort(s);
/* now all the surviving edges from |p| are in the list |gb_sorted[0]| */
j=0; /* |j| counts how many edges have been accepted */
for (q=(node*)gb_sorted[0]; q; q=q->link)
if (++j>max_degree)
d(k,q->kk)=-d(k,q->kk);
}
@ Random access to the distance matrix is provided to users via
the external function |miles_distance|. Caution: This function may be
used only on the graph most recently made by |miles|, and only when
the graph's |aux_data| has not been recycled, and only when the
|z| utility fields have not been used for another purpose.
The result may be negative when an edge has been suppressed. We can in fact
have |miles_distance(u,v)<0| when |miles_distance(v,u)>0|, if the
distance in question was suppressed by the |max_degree| constraint on~|u|
but not on~|v|.
@p int miles_distance(u,v)
Vertex *u,*v;
{
return d(u->z.i,v->z.i);
}
@ @(gb_miles.h@>=
extern int miles_distance();
@* Index. As usual, we close with an index that
shows where the identifiers of \\{gb\_miles} are defined and used.