geostatistics, spatial statistics, Donald E. Myers

WHAT IS GEOSTATISTICS?



This site is maintained by Donald E. Myers




You need not be a statistician to make good use of geostatistics, but you may need the assistance, support, guidance of a (geo?)statistician. A good engineer, ecologist, biologist, plant scientist, hydrologist, soil physicist already has a good start, because geostatistics is only good science brought up to date by the recognition that natural phenomena are subject to spatial variation. Your study of geostatistics will not displace other knowledge that you have; rather, it will extend your knowledge and make it more useful.

(paraphrased from a quotation of William Edwards Deming)






A BIT OF HISTORY

The application of statistics to problems in geology and mining as well as to hydrology date back a considerable time. For a time, geostatistics meant statistics applied to geology or perhaps more generally to problems in the earth sciences. Beginning in the mid-60's and especially in the mid-70's it became much more closely affiliated with the work of Georges Matheron and perhaps that connection is still the prevailing one today. Because much of his early work and also that of his students appeared primarily in French it was not as well known in the US and other countries. Several events began to change that however. In 1975 a NATO ASI was held near Rome, Italy on Advanced Geostatistics in the Mining Industry. The proceedings contained papers that were primarily in English. This had been preceded by a set of notes (by Matheron) prepared for a summer program in Fontainebleau. These notes were in English but not readily available. A more definitive theoretical article appeared in the J. Applied Probability in 1973.

Professeur Matheron was at the Ecole Normale Superieure des Mines de Paris (School of Mines), one of the Grande Ecoles. As part of a general move of research units out from the main location in Paris (adjacent to the Jardin du Luxembourg), Matheron established the Centre de Morphologie Mathematique. Later this became two programs, one on mathematical morphology and on on geostatistics. Matheron retired as Director of the Center only last year. Jean Serra's two volume series on mathematical morphology and image analysis is well-known and is based on Matheron's earlier book on random set theory. Two of Matheron's students were instrumental in implanting geostatistics in North America. Andre Journel moved to Stanford University in 1978 and also co-authored Mining Geostatistics with Ch. Huijbrechts. Michel David had earlier moved to the Ecole Polytechnique in Montreal and in 1977 published Geostatistical Ore Reserve Estimation. Journel was in the Department of Applied Earth Sciences but more recently that department has been closed and he is now in the Department of Petroleum Engineering and has established (with the aid of various oil companies) the Stanford Center for Reservoir Forecasting.

Matheron's work was not very well accepted in the statistical community for a period of time although a number of prominent statisticians were visitors at Fountainebleau in the '70's, '80's and '90's. In part this was because of a feeling that some of the work was a duplication of results that were already well-known but with different names. Matheron's propensity to only publish in French and only in "internal notes" at the Center probably contributed to this perception. Now however, geostatistics has established a place for itself both within the statistics journals and at national meeings. In the mid-'80's, with the help of M. Armstrong, an index of those notes was published in Mathematical Geology, while it was possible to order zeroxed copies from the Center there was no generally accessible repository outside of the Center. The index noted above is now well out of date. Again with the assistance of M. Armstrong, a small number of these notes have appeared as journal articles. GLOSSARY

Software

In the late '70's the Centre de Geostatistique, Fountainebleau, began a master level program in geostatistics (two years) which attracted a steady stream of students from industry and goverment in various countries. In cooperation with Shell Oil and the Bureau de Recherche Geologie Mathematique (the French USGS), a commercial software package called BLUEPACK was developed. The early version was only ported to the VAX but the successor, ISATIS, is available on a number of workstation platforms. It is marketed in the US by GEOMATH of Houston. Geostatistics without the computer is of little interest, in many ways the developments in geostatistics parallel those in computing, particularly the appearance of PC's and workstations.

Publications and Conferences

Two small volumes on geostatistics focused on mining appeared in English in the 70's, one by Jean-Michel Rendu and one by Isabel Clark. In the late '80's the volume by Isaaks and Srivastava appeared, subsequently a book by Noel Cressie (on the more general topic of spatial statistics but including geostatistics).

In the summer of 1983 a second NATO ASI was held at Lake Tahoe, NV with a more international mix and including researchers from a wider set of applications. Thanks to a series of four papers by Richard Webster and some of his students (then at Rothamstead Research Center in England), geostatistics became known in the soil sciences. These appeared in the J. Soil Science (1980-1981). In 1979 in Prague, the International Association of Mathematical Geologists was founded and later began publishing the J. of the Int. Assn. Math. Geologists (later the name was officially changed to simply Mathematical Geology). While the journal was not limited to geostatistics it quickly became a principal place to publish such papers. A third international geostatistics congress was held in Avignon, France in 1988, a fourth in Troia, Portugal in 1992 and the most recent in Wollongong, Australia in 1996. Following the 1983 conference, Andre Journel and Leon Borgman (University of Wyoming) proposed an annual summer retreat in geostatistics aimed at researchers in North America. The first one was held near DuBois, Wyoming in August of 1984. The group was small and families were encouraged, the sessions were informal and no proceedings were produced but subsequently a newsletter was started which has appeared infrequently since then. A non-organization was founded in 1987 at a meeting in the Chirachaua Mtns southeast of Tucson, there were to be no dues, no membership list, no subscription price for the newsletter but volunteers would be solicited each year to organize meetings. Several have been held in Canada as well as the US and in 1996 a meeting was held in Guanajuato, MX. Following the establishment of the newsletter in North America another newsletter intended for the European community was established.

Following the 1983 meeting several staff members at EPA-Las Vegas became interested in the application of geostatistics to environmental monitoring and assessment. In addition to research support for a number of individuals and programs, EPA commissioned a geostatistical software package, GEO-EAS, which was then released into the public domain. GEO-EAS was a DOS program but included a menuing system that made it fairly friendly and the price was right. Unfortunately for various reasons EPA has not continued to support the software and it has not been updated for a number of years. In 1992 Andre Journel and Clayton Deutsch published GSLIB which included a floppy disk. This was an extensive set of geostatistical programs (FORTRAN source code) and a users manual. Current versions of the code are available on the website at Stanford. Unfortunately the programs did not include any form of GUI and are intended to be run in batch mode. They are compilable on a variety of platforms however. In 1996 Yvan Pannatier published VARIOWIN together with a floppy disk. VARIOWIN is a MS-Windows version of two of the components of GEO-EAS. It allows for much larger data sets than in GEO-EAS and also interactive variogram modeling.

Geostatistics at the University of Arizona

Geostatistics has been taught at the University of Arizona since the fall of 1982, however collaborative work occurred beginning much earlier between Y.C. Kim and Donald Myers, A. W. Warrick (Soil, Water and Environmental Sciences) and Donald Myers. The courses rapidly attracted students from a variety of departments; Mining Engineering, Hydrology, Soil, Water and Environmental Sciences. More recently students from Remote Sensing, Plant Pathology, Geography, Tree-Ring Lab, Renewable Natural Resources have been attracted. This is a direct consequence of the quantitative nature of the research in these various programs

Other Developments

There were three other developments that should not be overlooked. B. Matern, working in Sweden developed essentially a parallel theory to Matheron but with applications primarily in forestry. His work appeared in Swedish in 1960 and was not translated into English until 1986 (Springer-Verlag). Y. Ghandin working in the former Soviet Union applied his work primarily in meterology and atmospheric sciences where it was known as Objective Analysis. This work did not appear in English until much later when he emigrated to Isreal. Finally in 1971, R. Hardy (Iowa State University) working on problems related to the interpolation of gravity data, developed what became known as Radial Basis Functions. His work is much better known in the numerical analysis literature.

Applications

Geostatistics is very much an applied discipline (or perhaps it is not even a discipline), its development has been the work of mining engineers, petroleum engineers, hydrologists, soil scientists, geologists as well as statisticians. There are applications in epidemiology, plant pathology or entomology as well as forestry, atmospheric sciences, global change, geography. There is some overlap with GIS (geographic information systems) and spatial statistics in general. Two other journals should especially be noted, Water Resources Research and the J. Soil Science Society of America. More recently articles have begun appearing in Environmetrics, Remote Sensing of the Environment as well as many others too numerous to mention.

As noted above, hydrology was an early application, the activity at three locations should be noted; L. Gelhar's group at MIT (which has links to New Mexico Tech in Socorro), the Hydrology group at Fontainebleau (particularly G. DeMarsily who is now at Universite Paris-Jussieu) and of course the Hydrology Department at the University of Arizona.

PROBLEMS AND OBJECTIVES

In one respect geostatistics might be viewed as simply a methodology for interpolating data on an irregular pattern but this is too simplistic. A number of interpolation methods/algorithms were already well known when geostatistics began to be known. Inverse Distance Weighting and Trend Surface Analysis as well as the much simpler Nearest Neighbor Algorithm.

First of all, geostatistics is concerned with spatial data. That is, each data value is associated with a location in space and there is at least an implied connection between the location and the data value. "Location" has at least two meanings; one is simply a point in space (which only exists in an abstract mathematical sense) and secondly with an area or volume in space. For example, a data value associated with an area might be the average value of an observed variable, averaged over that volume. In the latter case the area or volume is often called the "support" of the data. This is closely related to the idea of the support of a measure. Let x, y, ....,w be points (not just coordinates) in 1, 2, or 3 dimensional space and Z(x), Z(y),.... denote observed values at these locations. For example, this might be the grade of copper, temperature, hydraulic conductivity, concentration of a pollutant. Now suppose that t is a location that is not "sampled". The objective then is to estimate/predict the value Z(t) (and the data locations as well as the location t). If only this information is given then the problem is ill-posed, i.e., it does not have a unique solution. One way to obtain a unique solution is to introduce a model into the problem. There are two ways to do this; one is deterministic and the second is stochastic or statistical. Both approaches must somehow incorporate the idea that there is uncertainty associated with the estimation/prediction step. The value at the unsampled location is not itself random but our knowledge of it is uncertain. One approach then is to treat Z(x), Z(y),.... and Z(t) as being the values of random variables. IF the joint distribution of these random variables were known then the "best" estimator (best meaning unbiased and having minimal variance of the error of estimation) would be the conditional expectation of Z(t) given the values of the other random variables. However the data consists of only one observation of the random variables Z(x), Z(y),.... and none of the random variable Z(t), hence it is not possible to estimate or model this distribution using standard ways of modeling or fitting probability distributions.