Managing Records: Electronic Records: Managing GIS Records: GIS Development Guides
GIS Development Guides
Database Planning and Design
- Introduction
- Selecting Sources for the GIS Database
- The Logical/Physical Design of the GIS Database
- Procedures for Building the GIS Database
- Procedures for Managing and Maintaining the Database
- GIS Data Sharing Cooperatives
- Glossary
name="intro">1. INTRODUCTION
The primary purpose of this phase of the GIS development process is to specify "how" the GIS will perform the required applications. Database planning and design involves defining how graphics will be symbolized (i.e., color, weight, size, symbols, etc.), how graphics files will be structured, how nongraphic attribute files will be structured, how file directories will be organized, how files will be named, how the project area will be subdivided geographically, how GIS products will be presented (e.g., map sheet layouts, report formats, etc.)., and what management and security restrictions will be imposed on file access. This is done by completing the following activities:
- Select a source (document, map, digital file, etc.) for each entity and attribute included in the E-R diagram
- Set-up the actual database design (logical/physical design)
- Define the procedures for converting data from source media to the database
- Define procedures for managing and maintaining the database
The database planning and design activity is conducted concurrently with the pilot study and/or benchmark activities. Clearly, actual procedures and the physical database design cannot be completed before specific GIS hardware and software has been selected while at the same time GIS hardware and software selection cannot be finalized until the selected GIS can be shown to adequately perform the required functions on the data. Thus, these two activities (design and testing) need to be conducted concurrently and iteratively.
In many cases, neither database design matters nor hardware and software selection are unconstrained activities. First, the overall environment within which the GIS will exist must be evaluated. If there exist "legacy" systems (either data, hardware or software) with which the new GIS must be compatible, then design choices may be limited. Both GIS hardware and software configurations and database organizations that are not compatible with the existing conditions should be eliminated from further consideration. Secondly, other constraints from an organizational perspective must be evaluated. It may, for example, be preferable to select a specific GIS or database structure because other agencies with whom data will be shared have adopted a particular system. Finally, assuming that the intended GIS (whether it will be large or small) will be part of a corporate or shared database, the respective roles of each participant need to be evaluated. Clearly, greater flexibility of choice will exist for major players in a shared database (e.g., county, city, or regional unit of government) than for smaller players (town, village, or special purpose GIS applications). This does not mean that the latter must always go with the majority, but simply that the shared GIS environment must be realistically evaluated. In fact, one way for the smaller participants in a shared GIS to ensure their needs are considered, is to fully document their needs and resources using procedures recommended in these guidelines.
Finally, with the completion of both the database planning and design and the pilot study/benchmark activities, sufficient detailed data volume estimates and GIS performance information will be known to calculate reliable cost estimates and prepare production schedules. This becomes the final feasibility check before major resources are committed to data conversion and GIS acquisition.
What is already known about the GIS requirement
Prior phases of the GIS development process should have produced the following information which is needed at this time:
- A complete list of data, properly defined and checked for validity and consistency (from the master data list, E-R data model and metadata entries).
- A list of potential data sources (maps, aerial photos, tabular files, digital files, etc. ) cataloged and evaluated for accuracy and completeness (from the available data survey). This inventory would also include all legacy data files, either within the agency or elsewhere, which must be maintained as part of the overall shared database.
- The list of functional capabilities required of the GIS (from needs assessment).
2. SELECTING SOURCES FOR THE GIS DATABASE
This activity involves matching each entity and its attributes to a source (map, document, photo, digital file). The information available for this task is as follows:
- List of entities and attributes from the conceptual design phase
- The list of surveyed data sources from the Available Data Survey and their recorded characteristics in the metadata tables Source Documents, Entities Contained in Source, and Attributes by Entity.
Source Documents
| Source Document Name: | Parcel Map |
| Source ID #: | 1 |
| Source Organization: | Town of Amherst |
| Type of Document: | Map |
| Number of Sheets (map, photo, etc.): | 200 |
| Source Material: | Mylar |
| Projection Name: | UTM |
| Coordinate System: | State Plane |
| Date Created: | 5-Oct-91 |
| Last Updated: | 8-Nov-95 |
| Control Accuracy Map: | National Map Accuracy Standard |
| Scale: | Variable; 1" = 50 ft To 1" = 200 ft |
| Availability: | Current |
| Reviewed By: | Lee Stockholm |
| Review Date: | 19-Dec-95 |
| Spatial Extent: | Town of Amherst |
| File Format: | N/A |
| Comments: |
Entities Contained In Source
| Source ID #: | 1 |
| Entity Name: | Parcel |
| Spatial Entity: | Polygon |
| Estimate Volume Spatial Entity: | 126 per map sheet |
| Symbol: | None |
| Accuracy Description Spatial Entity: | National Map Accuracy Standard |
| Reviewed By: | Lee Stockholm |
| Review Date: | 02-Jan-94 |
| Scrub Needed: | Yes |
| Comments: |
Attributes By Entity
| Source ID #: | 1 |
| Entity Name: | Parcel |
| Attribute Name: | SBL Number |
| Attribute Description: | Section, Block, and Lot Number |
| Code Set Name: | N/A |
| Accuracy Description Attribute: | N/A |
| Reviewed By: | John Henry |
| Review Date: | 08-Feb-93 |
| Comments: |
If there is a choice between sources, that is, two or more sources are available for a particular entity attribute, then criteria for deciding between them will be needed. In general, these criteria will be:
- Accuracy of resulting data
- Cost of conversion from source to database
- Availability of the source for conversion
- Availability of a continuing flow of data for database maintenance.
Occasionally, alternative sources may result in different representations in the database, such as a vector representation versus a scanned image. In this situation, the ability of each representation to satisfy the requirements of the GIS applications will need to be evaluated.
Once a source has been selected, the metadata tables that record source data information need to be completed as appropriate. These are:
- Data Object Information
- Attribute Information
- Spatial Object Information
- Source Document Information
To complete the accuracy information, the accuracy expected from the conversion process will need to be determined. This accuracy target will also be used later in the database construction phase by the quality control procedures. The metadata tables that need to be completed at this time are shown below:
Data Object Information
| Data Object Name | Parcel |
| Type: | Simple |
| Data Object Description: | Land ownership parcel |
| Spatial Object Type: | Polygon |
| Comments: |
Attribute Information
| Data Object Name: | Parcel |
| Data Attribute Name: | SBL Number |
| Attribute Description: | Section, Block, and Lot Number |
| Attribute Filename: | Parcel.PAT |
| Codeset Name/Description: | N/A |
| Measurement Units: | N/A |
| Accuracy Description: | N/A |
| Comments: |
Spatial Object Information
| Data Object Name: | Parcel |
| Spatial Object Type: | Polygon |
| Place Name: | Amherst |
| Projection Name/Description: | UTM |
| HCS Name: | State Plane Coordinate System |
| HCS Datum: | NAD83 |
| HCS X-offset: | 1000000 |
| HCS Y-offset: | 800000 |
| HCS Xmin: | 25 |
| HCS Xmax: | 83 |
| HCS Ymin: | 42 |
| HCS Ymax: | 98 |
| HCS Units: | Feet |
| HCS Accuracy Description: | National Map Accuracy Standard |
| VCS Name: | |
| VCS Datum: | |
| VCS Zmin: | 0 |
| VCS Zmax: | 0 |
| VCS Units: | |
| VCS Accuracy Description: | |
| Comments: |
Source Document Information
| Data Object Name: | Parcel |
| Spatial Object Type: | Polygon |
| Source Document Name: | Parcel Map |
| Type: | Map |
| Scale: | Variable: 1" = 50 feet To 1" = 200 feet |
| Date Document Created: | 17-Nov-89 |
| Date Last Updated: | 05-Oct-94 |
| Date Digitized/Scanned: | 24-Apr-95 |
| Digitizing/Scanning Method Description: | Manual digitized with Wild B8 |
| Accuracy Description: | 90% of all tested points within 2 feet |
| Comments: |
For some of the above tables, information will be available for only some of the entries. The remaining entries will be completed later as the database is implemented. The examples shown are from the metadata portion of the GIS Design software package that accompanies these guidelines. This package is a Microsoft Access program that runs "stand-alone" (you do not need a copy of Microsoft Access) on a regular PC. Where the same information is needed for multiple tables, this information is only entered once. The information is then automatically transferred to the other tables where it is needed.
3. THE LOGICAL/PHYSICAL DESIGN OF THE GIS DATABASE
This activity involves converting the conceptual design to the logical/physical design of the GIS database (hereafter referred to as the physical design). The GIS software to be used dictates most of the physical database design. The structure and format of the data in a GIS, like ARC/INFO, Intergraph, MapInfo, System 9, etc. have already been determined by each vendor respectively. If one separates the conceptual entity and its attributes from the corresponding spatial entity and its geometric representation, it can be seen that the physical database design for the spatial entity has been completely defined by the vendor and the GIS designer does not need to do anything more for this part of the data. The attributes of the entities may, however, be held in a relational database management system linked to the GIS. If this is the case, the GIS analyst needs to design the relational tables for the attribute information.
It will not always be the case where one entity from the E-R diagram translates into a single layer. More complex representations will be needed. Generally this will involve two or more entities forming a single layer with, possibly, several relational database tables.
The water main segments, the valves and the fire hydrants have been placed together in one layer as line segments, and two sets of nodes. However, each entity has its own relational table to record its respective attributes (see Table 1). The relationship is maintained by unique keys for each instance of each entity.
Every entity shown on the E-R diagram must be translated to either a GIS layer, a relational table(s), or both, as indicated by the information to be included. In addition, every relationship of the type "relationship represented in database" (single line hexagon on the E-R diagram) must be implemented through the primary and secondary keys in the tables for the entities represented.
The entity "parcel" may "contain" the entity "building." The table for each entity would have its own primary key (ID#), however, the table for building must also have a secondary key (parcel ID#) to maintain the relationship in the database.
The completed physical database design must account for all entities and their attributes, the spatial object with topology and coordinates as needed, and all relationships to be contained in the database. The remaining items on the E-R diagram, the two types of spatial relationships, must be accounted for in the list of functional capabilities, that is, the implied spatial operations must be possible in the chosen GIS software.
4. PROCEDURES FOR BUILDING THE GIS DATABASE
Developing a GIS database is frequently thought of as simply replicating a map in a computer. As can be inferred by the nature and detail of the activities recommended up to this point in these guidelines, building a GIS database involves much more than "replicating a map." While substantial portions of the GIS database will come from map source documents, many other sources may also be used, such as aerial photos, tabular files, other digital data, etc. Also, the "map" representation is only part of the GIS database. In addition to the map representation and relational tables, a GIS can hold scanned images (drawings, plans, photos), references to other objects, names and places, and derived views from the data. The collection of data from diverse sources and its organization into a useful database requires development of procedures to cover the following major activities:
- Getting the Data which may include acquiring existing data from both internal and external sources, evaluating and checking the source materials for completeness and quality, and/or creating new data by planning and conducting aerial or field surveys. Contemporary GIS projects attempt to rely on existing, rather than new, data due to the high cost of original data collection. However, existing data (maps and other forms) were usually created for some other purpose and thus have constraints for use in a GIS. This places much greater importance on evaluating and checking the suitability of source data for use in a GIS.
- Fixing any problems in the data source, often focused only on map source documents, this activity has been called "map scrubbing." Depending on the technology to be used to convert the map graphic image into its digital form, the source documents will have to meet certain standards. Some conversion processes require the map to be almost perfect which other processes attempt to automate all needed "fixes" to the map. What is required here is for the GIS analyst to specify, in detail, a procedure capable of converting the map documents into an acceptable digital file while accounting for all known problems in the map documents. This procedure should be tested in the pilot project and modified as needed.
- Converting to digital data, the physical process of
digitizing or scanning to produce digital files in the required format.
The major decision here is whether or not to use an outside data conversion
contractor or to do the conversion within the organization. In either
case, specifications describing the nature of the digital files should
be prepared. In addition to including the physical database design,
specifications should describe the following:
- Accuracy requirements (completeness required, positional accuracy for spatial objects, allowable classification error rates for attributes).
- Quality control procedures that will be conducted to measure accuracy.
- Partitioning of the area covered by the GIS into working units (map sheets) and how these will be organized in the resulting database (including edge matching requirements).
- Document and digital file flow control, including logging procedures, naming conventions, and version control.
- Change control, most map series are not static but are updated on a periodic basis. Once a portion of the map has been sent to digitizing (or whatever process is used), a procedure must be in place to capture any updates to the map and enter these into the digital files.
- Building the GIS Database, once digitizing has been completed, the sponsoring organization has a set of digital files, not an organized database. The system integration process (a subsequent guideline document) must take all the digital files and set-up the ultimate GIS database in a form that will be efficient for the users. The several considerations required for this process are covered under GIS Data Database Construction, GIS System Integration and GIS maintenance and use.
5. PROCEDURES FOR MANAGING AND MAINTAINING THE DATABASE
Because the physical world is constantly changing, the GIS database must be updated to reflect these changes. Once again, the credibility of the GIS database is at stake if the data is not current.
Usually, the effort required to maintain the database is as much as, or more than that required to create it. This ongoing maintenance work is usually assigned to in-house personnel as opposed to a contractor. The entire process should be planned well in advance. Once again, the equipment and personnel must be ready to take over the maintenance of the database when the data conversion effort and database building processes are complete.
Database maintenance requires two supporting efforts: ongoing user training and user support. Ongoing user training is needed to replace departing users with newly trained personnel. This will enable the data maintenance to be carried out on a continuous and timely basis. It is also important to offer advanced training to existing users to provide them with the opportunity to improve their skills and to make better use of the system.
GIS is a complicated technology, making operating problems inevitable. User support will help users solve these problems quickly. It will also customize the GIS software to enable them to execute processing tasks more quickly and more efficiently. User support is usually provided by in-house or contract programmers. It requires a knowledge of the operating system and macro programming language as well as troubleshooting common command and file problems.
6. GIS DATA SHARING COOPERATIVES
The establishment of data sharing cooperatives within the public sector is a cost-effective means of database development and maintenance which is encouraged. Cooperative-multiparticipant database projects allow for data exchange, and the opportunity to create new means for developing, maintaining, and accessing information. The sharing of data in the public sector, especially between government agencies and offices which are funded by the same financial resources, should be expected. It does not make fiscal sense for public funds to be utilized in the development of two GIS databases of the same geographic area for two different agencies. Benefits of data sharing thus would include: the development of a much larger database, for far less cost; the development of more efficient interaction between public agencies; and through the utilization of a single, seamless database the availability of more accurate information, since all agencies would share the same, up-to-date information.
The goal of a data sharing strategy is to maximize the utility of data while minimizing the cost to the organization. It is important that all parties involved have clear and realistic expectations as well as common objectives to make the data sharing work. Under any circumstance, however, database management and maintenance will require us to redefine our relationships with those we routinely exchange data with, whether they are within an organization or part of a multiparticipant effort including outside agencies. Work flow and information flow must be reviewed and changed if necessary. Procedures and practices for the timely exchange and updating of data must be put in place and data quality standards adhered to, whether it be hard copy data which must be converted for inclusion or digital files which might be available for importing to our system. Systematic collection and integration of new and/or updated data must be employed in order to safeguard the initial investment, maintain the integrity of the database and assure system reliability to meet function needs.

