Survey of Available Data
- Data Required
- Potential Sources of Data
- Describing and Evaluating Potential Data
One of the most important elements of developing a GIS is finding and utilizing the appropriate data. The form of the data is critical to the overall database design and the success of the analyses performed with the system. The quality of the results produced from GIS analyses and applications ultimately resides in the quality of the data used. GIS data can be obtained in various formats from many different sources. Application requirements based upon quality, scale and level of completeness will depend upon the needs of the application. Once data requirements are developed, there are usually a plethora of data options which the potential user can choose from. Some of these choices will include whether to utilize government- or privately-developed data, cost in this case will be a major difference. Other choices may involve data currency, scale, accuracy, and depending upon the application, the data structure, platform specifications or even media format.
This guideline will discuss various information surrounding available GIS data including evaluating data requirements, various types and sources of available GIS data, potential data sets. This guideline will also discuss potential opportunities for data sharing.
Master Data List (from Needs Assessment)
One of the products available from a Needs Assessment is a Master Data List. Based upon descriptions of the tasks future GIS users will want to perform, a listing of the various required data is developed.
From the Needs Assessment you will have identified:
- the data entities
- the attributes associated with the entities
The Master Data list is used to prepare a database plan which includes:
- a logical/physical design of the GIS database
- procedures for building the GIS database
- procedures for managing and maintaining the database
In this guide, the procedures for identifying and documenting existing data will be described.
Types of Data
There are many different types of data which can be utilized by a GIS system. Each data type has its own unique properties and potential for contributing to the overall quality and functionality of the GIS database. These various data types are mapped data, tabular data listings, remotely sensed imagery, and scanned images. The following sections describe these data types.
Mapped Data/Map Series
Mapped data may refer to published maps found in an existing map series or collection. These maps should be logically classified based upon their data content (e.g., topographical, hydrological data). Maps which meet National Map Accuracy Standards are usually produced by federal or state government agencies. Paper maps, if not already in digital format, can be utilized in developing the database through vector tablet digitizing or scanning.
Mapped data can also be identified as geographic data which has been digitized into the vector data structure. Vector map data may be found with or without real-world coordinate information and may or may not have topological relationships. Many organizations which digitized their map data in the past, did so utilizing CAD (computer aided drafting), and thus were not able to establish topological relationships between their spatial elements. Today, there exists software which allows CAD data to be quickly converted into topologically correct geographic data which can then be assigned coordinate data within a GIS. Many alternative sources of digital spatial data thus exist, in addition to the volumes of topologically correct geographic data available from local, state and federal governments.
Attribute Tables or Lists
A readily available form of GIS input, data tables and listings are available from many different organizations and government agencies. Various data tables can be obtained as GIS input to provide additional attributes which will be associated with spatial data elements. These elements are easily linked using primary relationship keys. Database, spreadsheet or ASCII-delimited text tables include some of the various import formats available in many GIS systems. Any organization that maintains a database, or uses spreadsheets to organize their records is able to create digital listings. Tables and lists are available from almost any government organization as long as the data does not involve privacy issues which would impede accessing such data.
Image Data (Remotely Sensed Images, Aerial Photos)
Image data is an excellent source of GIS input data. It mainly consists of remotely sensed images which includes both aerial photographs (in analog or digital format) and satellite images. Aerial photos are normally captured with analog cameras. These cameras produce photographs whose data can be very important in a GIS system. Photographs, though not digital, can be digitized by using a vector digitizing tablet, or they can be scanned, and then input into the GIS as an image. In either case, the digital version will normally require rectification and re-scaling in order to correct camera distortions common with most aerial photography.
Until they are converted into a raster GIS format, basic raster images such as satellite imagery or scanned aerial photographs do not offer any topological connectivity or potential for GIS analysis. Satellite imagery is captured in raster digital format. With the advent of an open display architecture, many GIS packages are able to integrate both raster and vector data into the same display. Remotely-sensed image data is useful within an editing environment for display as a backdrop for both heads-up digitizing and updating of vector layers, for verification, or for conversion into raster GIS layers and then subsequently into vector data layers.
Most remote sensing cameras allow for the capture of infrared images, separating different light waves into varying band-widths which together and/or alone may show much more information than a normal camera reading only in the visible spectrum. Most GIS will allow for the display of these images and will allow for the assigning of different colors to the various bands for the effective display of the data. GIS packages today also allow for the processing of these images in order to rectify, warp, and geo-reference the imagery as necessary so that they will be useful as scaled images. After such procedures, geo-referenced images can be overlaid with similarly geo-referenced vector imagery for effective display.
Scanned Images (Pictures, Diagrams)
Scanned raster images are able to be displayed in a GIS the same way that satellite images are displayed. Any raster image, whether it be a scanned map, photograph or diagram, can be easily input into a GIS for display purposes. Integrating scanned images into a GIS display, or converting raster data into raster GIS format are fairly routine capabilities for most high-end GIS packages. As discussed earlier, a GIS allows for the assignment of coordinates to raster image data.
Scanned maps (as opposed to digitized vector representations) can be effective backgrounds upon which other GIS vector layers can be displayed. Scanned maps usually contain much valuable annotation which would be very time-consuming to duplicate in a vector environment. Including raster images allows for the enhancing of an application by providing the user with visual display data which can enhance the user's understanding of the data. Scanned photographs are especially effective. In many GIS packages, links can be established between an image viewer, which displays scanned images, and vector geographic features so that when an event sequence is initiated (e.g. selecting a vector feature), the raster image viewer appears with the specified scanned image.
There are three major formats in which GIS-usable data can be obtained. They include hardcopy/eye-readable format, analog image format, and in fully digital format. Unique types of information can be accessed from each of these data formats.
Hardcopy (Paper, Linen Or Film)/Eye-Readable
Hardcopy maps are easily accessed from a wide variety of organizations. Hardcopy maps, as a form of GIS source data, can be digitized on a digitizing tablet into vector GIS format, or scanned and then converted into raster GIS format. Although there are potential accuracy problems which are associated with paper and linen maps (related to distortions due to shrinkage/expansion of the media) in capturing geographic features, there is still much unique geographic data which can only be found on these maps. An example of unique data from paper or linen maps is seen when seeking geographic data for a certain time period. Much of the digital data which is readily available may only be the most current, updated data for a region. For example, in order to find geographic data from before 1970, the only choice may be to access a paper or linen map. Use a film copy of the source document where available as this will be the most stable media.
Accessing dated tabular information for the development of an attribute database may be a similar endeavor requiring the use of paper documents. Organizations which have been in existence since before the dawn of digital filing systems all had to keep their data in paper "hard-copy" format at one time. Some of these older records may have been converted into digital form at one point. In other cases, there may be hard-copy documents which are the only versions of dated material. In order to conserve space and the integrity of most documents, many might possibly have been copied onto microfiche.
Aerial photography is found to be an abundant geographic data form. Photogrammetry (aerial mapping) is a common way of creating an accurate and up-to-date land base. Aerial photos provide the raw data which is necessary for various planimetric and topographic mapping applications. Photographic images are a very rich data source in that many geographic features can be seen clearly on a photograph but may not be seen in a paper map or a vector digital file (e.g., a large clearing within a wooded area would not be differentiated on most paper maps, but it is clearly visible on the aerial photo).
Aerial photography is available from many sources (i.e.: USGS, DOT, County agencies, etc.) The federal government has recently developed the National Aerial Photography Program (NAPP) in which states that desire to have their counties flown may split the cost with the Federal government. Many useful products are derived from the NAPP including 1:12,000 hard or soft-copy orthophotographs. An orthophoto is a scanned aerial photograph which has been digitally rectified using control points and a digital elevation model. The digital versions are especially useful for GIS applications. If the type of digital aerial photography needed is not available, organizations can create a request for proposal to solicit bids for aerial mapping, although this can be very expensive.
Within the digital format genre, there are many different varieties of data available. These various options are becoming as numerous as what is currently available in paper maps. In terms of map graphics, there are again two different data structures which are quickly integrated into today's GIS systems: these are raster and vector data formats. Tabular data can be found in digital data format most frequently. Various forms of digital spatial data which are currently available in raster format may include some of the following:
- Scanned maps and aerial photography
- Satellite Imagery
- Digital Orthophotography
- Digital Elevation Models
Some of the various forms of digital spatial data which are currently available in vector format may include some of the following:
- Topological vector linework
- Non-topological vector linework
- Annotation layers
Some of the various forms of digital attribute data which can be input into a GIS includes file types associated with various software components: spreadsheet, database and word-processing. Some of the file formats which can be utilized include: dBase, Excel, and ASCII delimited text.
Government is the largest single source of geographic data. Data for most any GIS application can be obtained through federal, state, or local governments. Various data formats, whether paper, image or digital, can all be obtained through government resources. The following subsections give basic descriptions of the datasets which are available through some federal, state and regional/local government agencies.
Federal Data Sources
The federal government is an excellent source of geographic data. Two of the largest spatial databases which are national in coverage include the US Geological Survey's DLG (Digital Line Graph) database, and the US Census Bureau's TIGER (Topologically Integrated Geographic Encoding and Referencing) database. Both systems contain vector data with point, line and area cartographic map features, and also have attribute data associated with these features. The TIGER database is particularly useful in that its attribute data also contains census demographic data which is associated with block groups and census tracts. This data is readily used today in a variety of analysis applications. Many companies have refined various government datasets, including TIGER, and these datasets offer enhancements in their attribute characteristics, which increases the utility of the data. Unfortunately, problems associated with the positional accuracy of these datasets usually remain as these are much more difficult to resolve. Satellite and digital orthophoto imagery, raster GIS datasets, and tabular datasets are also available from various data producing companies and government agencies.
The following information on federal agencies was taken from the Manual of Federal Geographic Data Products developed by the Federal Geographic Data Committee (FGDC). To contact the FGDC:
Federal Geographic Data Committee Secretariat
US Geologic Survey
590 National Center
Reston, VA 22092
Phone: (703) 648-4533
The departments all have different agencies and bureaus within them which offer various listings on the types of data which are available (e.g. concerning data structure, scale, software export format, source data, currency, what applications the data can be used for), and from which agencies they can be acquired. The reader is encouraged to consult this manual for further information regarding the geographic data products related to these organizations.
DEPARTMENT OF AGRICULTURE
- The Agriculture Stabilization & Conservation Service: R
- Forest Service: B, H, L, Sur, T
- Soil Conservation Service: H, Sub, Sur
DEPARTMENT OF COMMERCE
- Bureau of the Census: B, S, H, Sur
- Bureau of Economic Analysis: B, S
- National Environmental Satellite Data & Info. Service: A, Ged, Gep, H, R, Sub, Sur, T
- National Ocean Service: Ged, H, R, Sub, Sur, T
- National Weather Service: A, R, T
DEPARTMENT OF DEFENSE
- Defense Mapping Agency: B, H, Sur, T
DEPARTMENT OF HEALTH & HUMAN SERVICES
- Centers for Disease Control: B, S
DEPARTMENT OF THE INTERIOR
- Bureau of Land Management: B, H, L, R
- Bureau of Mines: Sub
- Bureau of Reclamation: H, Sur
- Minerals Management Service: B, H, L
- National Park Service: B, H, Sur, T
- US Fish & Wildlife Service: H, Sur
- US Geological Survey: A, B, S, Ged, Gep, H, L, R, Sub, Sur, T
DEPARTMENT OF TRANSPORTATION
- Federal Highway Administration: Sur
- Federal Emergency Management Agency: H
- National Aeronautics & Space Administration: H, L, R, Sub, Sur
- Tennessee Valley Authority: B, S, Ged, H, L, R, Sub, Sur, T
Federal Agency Data Product Code:
- A = Atmospheric H = Hydrologic Sub = Subsurface
- B = Boundaries L = Land Ownership Sur = Surface and Manmade Features
- Ged = Geodetic R = Remotely Sensed T = Topography
- Gep = Geophysics S = Socioeconomic
National Spatial Data Infrastructure (NSDI)
There is a wealth of geographic data which can be accessed from federal and state agencies over the Internet. Most federal agencies which deal with geographic data have File Transfer Protocol (FTP) servers storing various geographic datasets. These servers allow organizations to download digital data over the Internet. One of the most populated servers is the US Geological Survey FTP server, which holds all of the USGS Digital Line Graph files (the USGS server FTP address can be found by calling the USGS at 1-800-USA-MAPS). The Census Bureau also has an FTP server which allows organizations to access portions of its TIGER/Line file database. Government FTP servers can be searched for on the Internet using ARCHIE.
Many federal and state agencies and corporations which deal with geographic data have Internet home pages which can be accessed on the World-Wide-Web. The US Geological Survey (USGS) home page, like the USGS FTP server, contains a wealth of information about USGS geographic data and how it can be used. From the USGS home page it is possible to search for, view, and download USGS data. One can also obtain USGS Fact Sheets, general information on the USGS, educational resources, publications, research papers, and informational resources on other Internet sites. Most federal agencies have their own home page and are structured similarly to the USGS home page. Most major GIS software vendors also have Internet home pages. Environmental Systems Research Institute (ESRI), Inc. has an excellent home page (URL address: http://www.esri.com) which contains a wide assortment of useful information.
State Government Agencies
There are many New York State agencies which are good sources of GIS data. Three of these organizations include the Department of Transportation, the Department of Environmental Conservation, and the Office of Real Property Services.
The New York State Department of Transportation (NYSDOT) offers data in paper and digital file formats. Paper topographic maps can be obtained at various scales. Most applicable to GIS needs, the NYSDOT has developed digital spatial files which are part of the New York State County Base Map Series. The Base Map files, though created with a CADD (Computer Aided Design and Drafting), have been designed for use in a GIS. The Department has developed a file structure which will allow for their conversion into a topological GIS format. There are various data layers available within this database including: Roads, Boundaries, Hydrography, Miscellaneous Transportation, and Names (NYSDOT, 1994). For further information, see Digital Files from the County Base Map Series from the NYSDOT.
The New York State Department of Environmental Conservation (NYSDEC) is another state organization which offers GIS data in varying formats. In 1990, the NYSDEC compiled an in-house inventory of its geographic data sources called the "Geographic Data Source Directory." The directory contains information on all of the DEC's geographic data sources with potential GIS applications. The DEC divided its data into the following categories: Air Resources, Construction Management, Fish and Wildlife, Hazardous Substances Regulation, Hazardous Waste Remediation, Lands and Forests, Law Enforcement, Management Planning and Information Systems Development, Marine Resources, Mineral Resources, Operations, Regulatory Affairs, Solid Waste, and Water (Warnecke et al, 1992). A copy of the directory is available from NYSDEC. Call your local office or the main office in Albany.
The New York State Office of Real Property Services (ORPS) has developed a database known as RPIS (Real Property Information System) which contains information on all tax parcels in the state. Each parcel contains a coordinate representing the center point of the parcel and attribute information which includes: unique land-based parcel identification numbers and descriptive information, such as land use, locations, sales information, exemptions, and other parcel attributes. RPIS data is available to local assessors, real property assessment offices , corporations and the general public for a nominal fee.
The New York State Department of Health (DOH) uses GIS in its work in analyzing and mapping environmental health risk areas and hazardous waste sites. The DOH has a database containing Census Bureau TIGER files and parcel maps. These GIS files can be acquired by the public.
Some other agencies which have GIS databases and which may have data usable in a GIS include: the Adirondack Park Agency (APA); the Hudson River Valley Greenway; New York Metropolitan Transportation Council; the Office of Parks, Recreation and Historic Preservation; Department of Public Service; State Emergency Management Office; New York City Department of Environmental Protection (Hilla, 1995); State Data Center Affiliates (various NYS Counties). Please note these are all examples and not intended to be an exhaustive list.
Regional And Local Governments
Many regional and local government agencies and organizations maintain GIS databases. These agencies may have data sharing arrangements with local companies and other municipalities. Information identifying which government agencies and companies have available GIS data layers may be found in regional or local GIS data directories. One such regional data directory developed within New York State is the Regional Directory of Geographic Data Sources for Genesee/Finger Lakes Counties. The directory contains information on participating government agencies and companies which have GIS data layers, then lists information regarding these layers, and provides the name, address and phone number of the person within the organization who can be contacted for further details or data sharing arrangements (GIS/SIG, 1995).
Private Data Firms
There are companies that will develop data for a local government. These companies will develop programs based on contract data conversion or public/private partnerships. Contract data conversion firms are available for those organizations that wish to have custom geographic datasets developed. Usually, the development of these datasets involves the client organization providing existing source data (e.g., paper maps) to the data development firm, which then converts the data into digital format.
In public/private partnerships, the company will work out an agreement with the local government that will provide data conversion but also retain the ability to market, sell and/or use the digital data that was created. Public/private agreements are just emerging as a method for creating GIS databases cost effectively. When considering a public/private partnership, issues such as ownership, access, freedom of information requirements and long-term data maintenance must be addressed as well as the cost sharing of building the database.
The next step is to actually survey the various departments within the local governments and other external sources to determine what data is available for use in the GIS and what condition the data is in.
The first step will be to document the data by developing metadata files for each database available. The metadata file is used for two roles. 1) develop information that will be used to evaluate the data for use in a GIS and 2) fulfill the metadata requirements for data once it is used in a GIS.
For each potential data source for the GIS database, the map series, photos, tabular files, etc. just be identified, reviewed, and evaluated for suitability to use in the GIS. Maps, photos, and remotely sensed data are the most likely sources and should be evaluated for:
- appropriate scale
- projection and coordinate system
- availability of geodetic control points
- aerial coverage
- completeness and consistency across entire area
- symbolization of entities (especially positional accuracy of symbol due either to size of symbol or off-set placement on map)
- quality of linework and symbols
- general readability and legibility for digitizing (labels)
- quality and stability of source material (paper/mylar)
- amount of manual editing needed prior to conversion
- edge match between map sheets
- existence and type of unique identifies for each entity (often entities shown on in map series used so-called "intelligent" keys or identifiers where an identifier for an object contains the map sheet number and/or other imbedded locational codes - in database design, it is much better to avoid "intelligent" keys of this type, particularly locational codes).
- positional and attribute accuracy
All of the above information needs to be documented for each potential data source. If a particular data source is then used to build part of the GIS database, some of this information will become part of the permanent metadata.
The metadata software accompanying this guideline provides three tables for recording the basic metadata about a potential data source. The first table contains information on the source document (or file); the second table can describe each entity contained on a source document; and the third table can describe each attribute of an entity. Once again, only the most basic entries have been included in the supporting software in order to keep the software simple an straightforward. A particular user may wish to expand the tables provided to meet his/her specific needs.
The following lists the fields of the three tables that contain source data information:
|Source Document Name:||Parcel Map|
|Source ID #:||1|
|Source Organization:||Town of Amherst|
|Type of Document:||Map|
|Number of Sheets (map, photo, etc.):||200|
|Coordinate System:||State Plane|
|Control Accuracy Map:||National Map Accuracy Standard|
|Scale:||Variable; 1" = 50 ft To 1" = 200 ft|
|Reviewed By:||Lee Stockholm|
|Spatial Extent:||Town of Amherst|
Entities Contained In Source
|Source ID #:||1|
|Estimate Volume Spatial Entity:||126 per map sheet|
|Accuracy Description Spatial Entity:||National Map Accuracy Standard|
|Reviewed By:||Lee Stockholm|
Attributes By Entity
|Source ID #:||1|
|Attribute Name:||SBL Number|
|Attribute Description:||Section, Block, and Lot Number|
|Code Set Name:||N/A|
|Accuracy Description Attribute:||N/A|
|Reviewed By:||John Henry|
Additional Criteria For Evaluating Potential Data Sources
As the survey is being conducted, it is important to consider the following issues about the data:
- Is the data current and what is its continuing availability?
- Is the data suitable for intended applications?
- Is the quality of the data appropriate for the type of applications needed? This should include both locational and attribute accuracy.
- Is the data cost effective?
FOR FURTHER INFORMATION:
The Manual of Federal Geographic Data Products, developed by the Federal Geographic Data Committee, is an excellent source for information on geographic datasets produced by agencies within the federal government. Listed by federal agencies and bureaus within each federal department, there are listings on the types of data which are available (e.g. concerning data structure, scale, software export format, source data, currency, what applications the data can be used for), and from which agencies they can be acquired.
To order contact:
Federal Geographic Data Committee Secretariat
US Geologic Survey
590 National Center
Reston, VA 22092
Phone: (703) 648-4533
New York State Department of Transportation data listing: Digital Files from the County Base Map Series.
Map Information Section
Mapping and Geographic Information Systems Bureau
New York State Department of Transportation
State Office Campus
Building 4, Room 105
Albany, New York 12232
Phone: (518) 457-3555
Example of a Regional Level GIS Data Directory:
1995 Regional Directory of Geographic Data Sources, developed by the GIS/SIG (Geographic Information Sharing/Special Interest Group) for New York State's Genesee/Finger Lake Region Counties. The directory is a listing of the various data sources which are available from local companies, and local government agencies in the Genesee/Finger Lakes Region.
The International GIS Source book, published by GIS World, Inc. is an annual publication which contains an excellent "Data Source Listings" chapter. It provides a wealth of information on companies which produce GIS datasets and also provides descriptions of the data they produce. The chapter also lists the different types of spatial data produced by public agencies, and lists data availability and contacts.
Hilla, Christine M. "The Revolution of Geographic Information Systems in Land Use and Environmental Planning in New York State," Environmental Law in New York, Vol. 6, no. 3., March, 1995.
Montgomery, Glenn E. and Harold C. Schuch, 1993. GIS Data Conversion Handbook. Fort Collins, CO: GIS World, Inc., pp. 89-91.
NYSDOT (New York State Department of Transportation), Digital Files from the County Base Map Series, Mapping and Geographic Information Systems Bureau (1994).
Warnecke, L., J. Johnson, K. Marshall and R. Brown, State Geographic Information Activities Compendium, 294 Council of State Government (1991).