geostatistics, spatial statistics, Donald E. Myers
This site is maintained by Donald E. Myers
You need not be a statistician to make good use of
geostatistics, but you may need the assistance, support,
guidance of a (geo?)statistician. A good engineer,
ecologist, biologist, plant scientist, hydrologist, soil
physicist already has a good start, because geostatistics
is only good science brought up to date by the recognition
that natural phenomena are subject to spatial variation.
Your study of geostatistics will not displace other
knowledge that you have; rather, it will extend your
knowledge and make it more useful.
(paraphrased from a quotation of William Edwards Deming)
A BIT OF HISTORY
The application of statistics to problems in geology and mining as well as to hydrology date
back a considerable time. For a time, geostatistics meant statistics applied to geology or perhaps more
generally to problems in the earth sciences. Beginning in the mid-60's and especially in the mid-70's
it became much more closely affiliated with the work of Georges Matheron and perhaps that
connection is still the prevailing one today. Because much of his early work and also that of his
students appeared primarily in French it was not as well known in the US and other countries. Several
events began to change that however. In 1975 a NATO ASI was held near Rome, Italy on Advanced
Geostatistics in the Mining Industry. The proceedings contained papers that were primarily in
English. This had been preceded by a set of notes (by Matheron) prepared for a summer program in
Fontainebleau. These notes were in English but not readily available. A more definitive theoretical
article appeared in the J. Applied Probability in 1973.
Professeur Matheron was at the Ecole Normale Superieure des Mines de Paris (School of
Mines), one of the Grande Ecoles. As part of a general move of research units out from the main
location in Paris (adjacent to the Jardin du Luxembourg), Matheron established the Centre de
Morphologie Mathematique. Later this became two programs, one on mathematical morphology and
on on geostatistics. Matheron retired as Director of the Center only last year. Jean Serra's two
volume series on mathematical morphology and image analysis is well-known and is based on
Matheron's earlier book on random set theory. Two of Matheron's students were instrumental in
implanting geostatistics in North America. Andre Journel moved to Stanford University in 1978 and
also co-authored Mining Geostatistics with Ch. Huijbrechts. Michel David had earlier moved to the
Ecole Polytechnique in Montreal and in 1977 published Geostatistical Ore Reserve Estimation.
Journel was in the Department of Applied Earth Sciences but more recently that department has been
closed and he is now in the Department of Petroleum Engineering and has established (with the aid
of various oil companies) the Stanford Center for Reservoir Forecasting.
Matheron's work was not very well accepted in the statistical community for a period of time
although a number of prominent statisticians were visitors at Fountainebleau in the '70's, '80's and
'90's. In part this was because of a feeling that some of the work was a duplication of results that
were already well-known but with different names. Matheron's propensity to only publish in French
and only in "internal notes" at the Center probably contributed to this perception. Now however,
geostatistics has established a place for itself both within the statistics journals and at national
meeings. In the mid-'80's, with the help of M. Armstrong, an index of those notes was published in
Mathematical Geology, while it was possible to order zeroxed copies from the Center there was no
generally accessible repository outside of the Center. The index noted above is now well out of date.
Again with the assistance of M. Armstrong, a small number of these notes have appeared as journal
articles.
GLOSSARY
Software
In the late '70's the Centre de Geostatistique, Fountainebleau, began a master level program
in geostatistics (two years) which attracted a steady stream of students from industry and goverment
in various countries. In cooperation with Shell Oil and the Bureau de Recherche Geologie
Mathematique (the French USGS), a commercial software package called BLUEPACK was
developed. The early version was only ported to the VAX but the successor, ISATIS, is available on
a number of workstation platforms. It is marketed in the US by GEOMATH of Houston.
Geostatistics without the computer is of little interest, in many ways the developments in geostatistics
parallel those in computing, particularly the appearance of PC's and workstations.
Publications and Conferences
Two small volumes on geostatistics focused on mining appeared in English in the 70's, one
by Jean-Michel Rendu and one by Isabel Clark. In the late '80's the volume by Isaaks and Srivastava
appeared, subsequently a book by Noel Cressie (on the more general topic of spatial statistics but
including geostatistics).
In the summer of 1983 a second NATO ASI was held at Lake Tahoe, NV with a more
international mix and including researchers from a wider set of applications. Thanks to a series of four
papers by Richard Webster and some of his students (then at Rothamstead Research Center in
England), geostatistics became known in the soil sciences. These appeared in the J. Soil Science
(1980-1981). In 1979 in Prague, the International Association of Mathematical Geologists was
founded and later began publishing the J. of the Int. Assn. Math. Geologists (later the name was
officially changed to simply Mathematical Geology). While the journal was not limited to geostatistics
it quickly became a principal place to publish such papers. A third international geostatistics congress
was held in Avignon, France in 1988, a fourth in Troia, Portugal in 1992 and the most recent in
Wollongong, Australia in 1996. Following the 1983 conference, Andre Journel and Leon Borgman
(University of Wyoming) proposed an annual summer retreat in geostatistics aimed at researchers in
North America. The first one was held near DuBois, Wyoming in August of 1984. The group was
small and families were encouraged, the sessions were informal and no proceedings were produced
but subsequently a newsletter was started which has appeared infrequently since then. A non-organization was founded in 1987 at a meeting in the Chirachaua Mtns southeast of Tucson, there
were to be no dues, no membership list, no subscription price for the newsletter but volunteers would
be solicited each year to organize meetings. Several have been held in Canada as well as the US and
in 1996 a meeting was held in Guanajuato, MX. Following the establishment of the newsletter in
North America another newsletter intended for the European community was established.
Following the 1983 meeting several staff members at EPA-Las Vegas became interested in
the application of geostatistics to environmental monitoring and assessment. In addition to research
support for a number of individuals and programs, EPA commissioned a geostatistical software
package, GEO-EAS, which was then released into the public domain. GEO-EAS was a DOS program
but included a menuing system that made it fairly friendly and the price was right. Unfortunately for
various reasons EPA has not continued to support the software and it has not been updated for a
number of years. In 1992 Andre Journel and Clayton Deutsch published GSLIB which included a
floppy disk. This was an extensive set of geostatistical programs (FORTRAN source code) and a
users manual. Current versions of the code are available on the website at Stanford. Unfortunately
the programs did not include any form of GUI and are intended to be run in batch mode. They are
compilable on a variety of platforms however. In 1996 Yvan Pannatier published VARIOWIN together
with a floppy disk. VARIOWIN is a MS-Windows version of two of the components of GEO-EAS.
It allows for much larger data sets than in GEO-EAS and also interactive variogram modeling.
Geostatistics at the University of Arizona
Geostatistics has been taught at the University of Arizona since the fall of 1982, however
collaborative work occurred beginning much earlier between Y.C. Kim and Donald Myers, A. W.
Warrick (Soil, Water and Environmental Sciences) and Donald Myers. The courses rapidly attracted
students from a variety of departments; Mining Engineering, Hydrology, Soil, Water and
Environmental Sciences. More recently students from Remote Sensing, Plant Pathology, Geography,
Tree-Ring Lab, Renewable Natural Resources have been attracted. This is a direct consequence of
the quantitative nature of the research in these various programs
Other Developments
There were three other developments that should not be overlooked. B. Matern, working in
Sweden developed essentially a parallel theory to Matheron but with applications primarily in forestry.
His work appeared in Swedish in 1960 and was not translated into English until 1986 (Springer-Verlag). Y. Ghandin working in the former Soviet Union applied his work primarily in meterology
and atmospheric sciences where it was known as Objective Analysis. This work did not appear in
English until much later when he emigrated to Isreal. Finally in 1971, R. Hardy (Iowa State
University) working on problems related to the interpolation of gravity data, developed what became
known as Radial Basis Functions. His work is much better known in the numerical analysis literature.
Applications
Geostatistics is very much an applied discipline (or perhaps it is not even a discipline), its
development has been the work of mining engineers, petroleum engineers, hydrologists, soil scientists,
geologists as well as statisticians. There are applications in epidemiology, plant pathology or
entomology as well as forestry, atmospheric sciences, global change, geography. There is some
overlap with GIS (geographic information systems) and spatial statistics in general. Two other
journals should especially be noted, Water Resources Research and the J. Soil Science Society of
America. More recently articles have begun appearing in Environmetrics, Remote Sensing of the
Environment as well as many others too numerous to mention.
As noted above, hydrology was an early application, the activity at three locations should be
noted; L. Gelhar's group at MIT (which has links to New Mexico Tech in Socorro), the Hydrology
group at Fontainebleau (particularly G. DeMarsily who is now at Universite Paris-Jussieu) and of
course the Hydrology Department at the University of Arizona.
PROBLEMS AND OBJECTIVES
In one respect geostatistics might be viewed as simply a methodology for interpolating data
on an irregular pattern but this is too simplistic. A number of interpolation methods/algorithms were
already well known when geostatistics began to be known. Inverse Distance Weighting and Trend
Surface Analysis as well as the much simpler Nearest Neighbor Algorithm.
First of all, geostatistics is concerned with spatial data. That is, each data value is associated
with a location in space and there is at least an implied connection between the location and the data
value. "Location" has at least two meanings; one is simply a point in space (which only exists in an
abstract mathematical sense) and secondly with an area or volume in space. For example, a data value
associated with an area might be the average value of an observed variable, averaged over that
volume. In the latter case the area or volume is often called the "support" of the data. This is closely
related to the idea of the support of a measure. Let x, y, ....,w be points (not just coordinates) in 1,
2, or 3 dimensional space and Z(x), Z(y),.... denote observed values at these locations. For example,
this might be the grade of copper, temperature, hydraulic conductivity, concentration of a pollutant.
Now suppose that t is a location that is not "sampled". The objective then is to estimate/predict the
value Z(t) (and the data locations as well as the location t). If only this information is given then the
problem is ill-posed, i.e., it does not have a unique solution. One way to obtain a unique solution is
to introduce a model into the problem. There are two ways to do this; one is deterministic and the
second is stochastic or statistical. Both approaches must somehow incorporate the idea that there is
uncertainty associated with the estimation/prediction step. The value at the unsampled location is not
itself random but our knowledge of it is uncertain. One approach then is to treat Z(x), Z(y),.... and
Z(t) as being the values of random variables. IF the joint distribution of these random variables were
known then the "best" estimator (best meaning unbiased and having minimal variance of the error of
estimation) would be the conditional expectation of Z(t) given the values of the other random
variables. However the data consists of only one observation of the random variables Z(x), Z(y),....
and none of the random variable Z(t), hence it is not possible to estimate or model this distribution
using standard ways of modeling or fitting probability distributions.