Managing Records: Electronic Records: Managing GIS Records: GIS Development Guides
Database Construction
- Introduction
- Information Required to Support Data Conversion Process
- Data Conversion Technologies Available
- Data Conversion Contractors
- Data Conversion Processes
- Attribute Data Entry
- External Digital Data
- Accuracy and Final Acceptance Criteria
- Glossary
Scope Of Database Construction
A database construction process is divided into two major activities
- creation of digital files from maps, air photos, tables and other source documents;
- organization of the digital files into a GIS database.
This guideline document describes the first process, digital conversion, and the subsequent guideline entitled "GIS System Integration" deals with the organization of the digital files into a database.
2. INFORMATION REQUIRED TO SUPPORT DATA CONVERSION PROCESS
Data Model
GIS technology employs computer software to link tabular databases to map graphics, allowing users to quickly visualize their data. This can be in the form of generating maps, on-line queries, producing reports, or performing spatial analysis.
GIS (Spatial) Data Formats
In digital form, GIS data is composed of two types: map graphics (layers) and tabular databases.
- Map graphics represent all of the features (entities) on a map as points, lines, or areas, or pixels.
- Tabular databases contain the attribute information which describe the features (buildings, parcels, poles, transformers, etc.).
GIS data layers are created through the process of digitizing. The digitizing process produces the digital graphic features (point, line or area) and their geographical location. Tables can be created from most database files and can be loaded into a GIS from spreadsheet or database software programs like Excel, Access, FoxPro, Oracle, Sybase, etc. A common key must be established between the map graphics and the tabular database records to create a link. This link is usually defined during the scrubbing phase (data preparation) and created during data capture (digitizing). For parcel data, the parcel-id or SBL number (section, block and lot) is a good example of a common key. The map graphic (point or polygon) is assigned an SBL number as it is digitized. The database records are created with an SBL number and other attributes of the parcel (value, landuse, ownership, etc.).
Raster and Vector Format
GIS allows map or other visual data to be stored in either a raster or vector data structure:
There are two types of raster or scanned image: 1) remotely sensed data from satellites; and 2) scanned drawings or pictures. Satellite imagery partitions the earth's surface into a uniform set of grid cells called pixels. This type of GIS data is termed raster data. Most remote sensing devices record data from several wave-lengths of the electromagnetic spectrum. These values can be interpreted to produce a "classified image" to give each pixel a value that represents conditions on the earth's surface (e.g., land use/land cover, temperature, etc.). The second type of scanned image is a simple raster image where each pixel can be either black or white (on or off) or can have a set of values to represent colors. These scanned images can be displayed on computer screens as needed.
Raster data is produced by scanning a map, drawing or photo. The result is an array of pixels (small, closely packed cells) which are either turned "on" or "off." A simple scanned image, for example, in TIFF (Tagged Image File format) format, does not have the ability to be utilized for GIS analysis, and is used only for its display value. The "cells" of the digital version of the image do not have any actual geographical nature as they represent only the dimensions of the original analog version of the image. Raster data in its most basic form is purely graphical and has no "intelligence" or associated database records.
Raster data can be enhanced to provide spatial analysis within a GIS. Pixels or cells represent measurable areas on the earth's surface and are linked to attribute information. These cells are assigned numeric values which correspond to the type of real-world entity which is represented at that location (e.g., cells containing value "2" may represent a lake, cells of value "3" may represent a particular wooded area, etc.).
- Vector data represents map features in graphic elements known as points, lines and polygons (areas).
Vector graphics coordinates are represented as single, or a series of, xy-coordinates. Data is normally collected in this format by tracing map features on the actual source maps or photos with a stylus on a digitizing board. As the stylus passes over the feature, the operator activates the appropriate control for the computer to capture the xy-coordinates. The system stores the xy-coordinates within a file. Vector data can also be collected on-screen (called "heads-up" digitizing), by tracing a scanned image on the computer screen in a similar manner.
3. DATA CONVERSION TECHNOLOGIES AVAILABLE
Manual Digitizing
Manual digitizing involves the use of a digitizing tablet and cursor tool called a "puck", a plastic device holding a coil with a set of locator cross-hairs to select and digitally encode points on a map. A trained operator securely mounts the source map upon the digitizing tablet and, utilizing the cross hairs on the digitizing puck, traces the cross hair axis along each linear feature to be captured in the digital file. The tablet records the movement of the puck and captures the features coordinates. The work is time-consuming and labor intensive. Concentration, skill and hand-eye coordination are crucial in order to maintain the positional accuracy and completeness of the map features.
Traditional data conversion efforts are based on producing a vector data file compiled by manually digitizing paper maps. Vector data provides a high degree of GIS functionality by associating attributes with map features, allowing graphic selections, spatial queries and other analytical uses of the data. Vector data also carries with it the highest costs for conversion. The industry average for a complete data conversion project to digitize parcel lines, dimensions and text is between $3.00 - $5.00 per parcel. The price is determined by the complexity and amount of data. To keep costs down, data can be selectively omitted from conversion (i.e. not all text and annotation will be captured). The resulting vector data can reproduce a useful, albeit more visually stark version of the original map. A bare bones data conversion project can be conducted by digitizing only the linework from the tax maps. The minimum industry cost for digitizing parcel line work with a unique ID only is between $1.00 to $1.50 per parcel.
Scanning
Scanning converts lines and text on paper maps into a series of picture elements or pixels. The higher the resolution of the scanned image (more dots per square-inch), the smoother and more accurately defined the data will appear. As the dots per inch (DPI) increases, so does the file size. Most tax maps should be captured with a scan resolution of 300-400 DPI. One of the main advantages to scanning is that the user sees a digital image that looks identical to their paper maps -- complete with notes, symbology, text style and coffee stains, etc. Scanning can replicate the visual nature of the original map at a fraction of the cost of digitizing. However this low cost has a "price". The raster image is a dumb graphic -- there is no "intelligence" associated with it, i.e. individual entities cannot be manipulated. Edge-matching and geo-referencing the images (associating the pixels with real world coordinates) improves the utility of the scanned images by providing a seamless view of the raster data in an image catalog. Scanned images require more disk space than an equivalent vector dataset, but the trade-off is that the raster scanning conversion process is faster and costs less than vector conversion.
Raster to Vector Conversion
Scanned data, in raster format, can be "vectorized" (converted into vector data) in many high-end GIS software packages or through a stand-alone data conversion package. Vectorizing simply involves running a scanned image through a conversion program. In the vectorization process, features which are represented as pixels are converted into a series of X,Y points and/or linear features with nodes and vertices. Once converted within a GIS environment, the data is in the same format created using a digitizing tablet and cursor. Many vectorized datasets require significant editing after conversion.
Hybrid Solution
Since both vector and raster datasets have decided advantages and disadvantages, a hybrid solution capitalizes on the best of both worlds. Overlaying vector format data with a geo-referenced backdrop image provides a powerful graphic display tool. The combined display solution could show the vector map features and their attributes (also available for GIS query), and an exact replica of the scanned source material which may be a tax map or aerial photography. If needed, individual parcels, pavement edges, city blocks or entire maps can be vectorized from the geo-referenced scanned images. This process is called incremental conversion. It allows the county to convert scanned raster data to vector formatted data on an as-needed basis. There are a plethora of raster to vector conversion routines on the market, but it is important that the conversion take place in the same map coordinate system and data format as your existing data. The key advantage to the hybrid approach is this: even after full vectorization, the scanned images continue to provide a higher quality graphic image as a visual backdrop behind the vector data.
Entry of Attribute Data
Additional attribute data can be added to the database by joining a table which contains the new attributes to an existing table already in the GIS. To join these tables together a common field must be present. Most GIS software can then use the resulting table to display the new attributes linked to the entities. There are various sources for building an attribute database for a GIS. From CD-ROM telephone and business market listings with addresses, to data which is maintained in various government databases in "dbase" or various other database formats.
Acquisition of External Digital Data
The availability of existing digital data will have an effect upon the design of the database. Integrating existing databases with the primary GIS will require the establishment of common data keys and other unique identifiers. Issues of data location, data format, record match rates, and the overall value of integrating the external data should all be considered before deciding to purchase or acquire existing datasets.
GIS Hardware And Software Used in Digital Data Conversion
Most contemporary GIS software packages are structured to operate on computer workstations to accomplish digitizing and editing tasks.
Four basic types of workstations can be identified:
- A digitizing station, a workstation which is connected to a precision digitizing tablet, which utilizes a high-resolution display terminal, and which also has all of the analysis functions necessary for querying, displaying and editing data
- An editing workstation, which is used for conducting most of the QA/QC functions of the conversion process, having all the functionality of the digitizing station except for the ability to digitize data via a digitizing tablet
- Graphic data review/Tabular data input workstations are used for displaying and reviewing graphic data, and for the entering of tabular attribute data associated with these features
- X Terminals are the fourth type of workstation and these allow for graphic display and input of data utilizing the X Window System communications protocol.
With the increasing power of todays personal computers, many GIS analysis packages are being designed for PCs. As GIS data files are very large, PC-based GIS packages usually require a PC with minimum requirements including a 486 processor and 16 megabytes of RAM. Hard-drive disk space depends upon how large the datasets are which are being used. A safe bottom-line for hard-drive space with a PC is 500 megabytes. For most data conversion projects, much more hard-drive space will be needed in order to store data as they are converted. Tape storage hardware is also necessary in order to efficiently backup the many megabytes of files created in the conversion process. Just to provide an idea of the storage requirements necessary for basic scanning conversion, the file-size of one tax map alone, in (Tagged Image File Format or TIFF) image format, scanned at a 500 dots per inch (dpi) resolution, can range anywhere from 1-3 megabytes.
Digitizing hardware requirements vary according to the conversion approach which is applied. For vector conversion, a digitizing tablet will be necessary in usually a manual digitizing process. Another piece of digitizing hardware, a scanner, is used to create raster images. Automatic digitization, through the use of a scanner is a very popular approach for capturing data. Raster data can subsequently be transformed into vector data in most turn-key GIS packages, through the use of raster-to-vector conversion algorithms.
After the conversion of map data into digital form, hardware will be needed for outputting digital data in hardcopy format. When handling a data conversion project, a necessary piece of output hardware is a pen or raster plotter. GIS software allows for the creation of plots at any viewscale. The plotter, with its ability to draw on a variety of materials (including paper, mylar and vellum), allows for the creation of quality map plots. Most plotters usually have a minimum width of three feet. Vector and raster plotters are both available on the market. Vector, pen plotters utilize various pens for the drawing of linear features on drawing media. Pen plotters can handle most plotting jobs, but they do not produce good results in area shading such as in the production of cholorpleth maps. Raster plotters, on the other hand, are excellent in producing shading results. Raster plotters usually cost more than vector plotters, but are substantially more versatile and have better capabilities.
Other output devices for the creation of hardcopies of GIS data include: screen copy devices, used for copying screen contents onto paper without having to produce a plot file; computer FAX (facsimile) transmissions, often used in communications between conversion contractors and clients, produce small letter-size plots, and the fax transmission files (as raster images) can be saved and viewed later; printers are used to output tabular data which is derived from the GIS, and if configured correctly, can produce small letter-size plots.
Pilot Project/Benchmark Test Results
The pilot project is a very important activity that precedes the data conversion project. The pilot project allows you, the GIS software developer, and the data conversion contractor the ability to test and review the numerous steps involved in creating the database. Defining the pilot study area involves selection of a small geographic area which will allow for a high degree of being successful, that is, that it will be completed in a relatively short period of time and will allow for the testing of all project elements which are necessary (conversion procedures, applications, database design). Test results which are obtained from the pilot project usually include assessments of: database content, conversion procedures, suitability of sources, database design, efficiency of prepared applications on datasets, the accuracy of final data, and cost estimates.
Identified Problems With Source Data
The pilot study involves testing and finding successes and problems in procedures and designs for the GIS. It involves looking for problems that occur due to lack of, or inadequacy in, source data. It is important to identify problems especially at the source data level since it is usually the easiest and cheapest to correct errors prior to data conversion.
When evaluating the results of a pilot study, problems with digital data accuracy resulting from source data flaws, are bound to arise. Usually, the source data used for a project are not in the proper format required for the best possible data result. For example, problems may arise when the source data for a certain data layer consists of maps which are at various scales. These various scale differences can create error when these digitized layers are joined into a single layer. Other problems arise when there are not adequate control points found upon map sheets in order to accurately register coverages while they are being digitized. At times, even adjacent large-scale source map sheets may have positional discrepancies between them. Such inconsistencies will be reflected in the corresponding digital data. Procedures for dealing with all known source data problems need to be specified prior to the start of data conversion.
4. DATA CONVERSION CONTRACTORS
Firms Available And Services Offered
There are different types of firms which can handle GIS data conversion. There are some firms which specialize in GIS data conversion, and sub-contract out the services of other firms as needed. Some other firms which handle data conversion but do not particularly specialize in data conversion alone include: aerial mapping firms, engineering firms and GIS vendors. Various firms will offer standard data conversion services, but based upon their main type of work, may offer some unique services. For example, a firm specializing in GIS data conversion may have a wide variety of software options which the client company can choose from. Such a firm usually will have numerous digitizing workstations and a large staff, and be able to complete the project in a shorter period of time than other firms which do not particularly specialize in GIS data conversion. If needed, a specialized GIS data conversion company could subcontract services from another company.
Aerial mapping firms can offer many specialized data conversion services associated with photogrammetry, which will not be available directly through a general data conversion contractor. Many aerial mapping firms now have considerable expertise with the creation of digital orthophoto images, rectified and scaled scans of aerial photography, which can be displayed and utilized with vector data. Engineering and surveying firms are well-equipped to deal with most data conversion projects, and will usually have a major civil engineering/surveying unit within the organization. These firms usually will focus upon certain aspects of GIS systems and approach conversion projects with stress upon the extent of construction detail, positional accuracy requirements, COGO input, scale requirements and database accuracy issues. At times, GIS software vendors will handle data conversion projects in order to test their software in benchmark studies and pilot projects.
The main conversion services which are usually offered include: physical GIS database design and implementation, deed research, record compilation, scrubbing, digitizing, surveying, programming and image development and registration.
Approximate Cost of Services
Outsourcing data conversion with data purchase/ownership
| CONVERSION METHOD | PER-PARCEL COST |
|---|---|
| Manually digitized vector data (linework alone) | $1.20 / Parcel |
| Manually digitized vector data (linework & annotation) | $5.00 / Parcel |
| Vector data developed from the vectorization of scanned maps (linework & Annotation) | $3.00 / Parcel |
| Raster image data (registered to a coordinate system) | $50. / map = $0.55 / Parcel |
Outsourcing Data Conversion and Licensing Data
| CONVERSION METHOD | PER-PARCEL COST |
|---|---|
| Manually digitized Vector Data (Linework and Annotation) | $1.50 / Parcel |
| (No cost estimates are available for Raster Data) |
(Note: All of the above cost estimates are based upon average prices offered by various data conversion vendors)
Making Arrangements For External Data Conversion
There are a number of ways of obtaining the digital conversion of map data. Arrangements are usually made through the development of a Request for Proposal (RFP), and then evaluating the proposals submitted by various conversion contractors. Some of the criteria which are desired in selecting a conversion contractor include: the company's technical capability, the company's experience with data conversion, the company's range of services, location, personnel experience and the overall technical plan of operation. Balanced with all of these items is usually the organization's budget and the costs associated with the project.
Digital Conversion Of Mapped Data
Digital data conversion of mapped data is a costly and time-consuming effort. The more closely the digital data reflects the source document, and the more attributes are associated with the map features, the higher the map utility but also the higher the cost of conversion. Because of the high cost of digitizing all graphic map features, and text/graphic symbology, conversion efforts may compromise data functionality by limiting the number of features captured in order to keep costs down. The actual processes involved with digital conversion of mapped data are usually the most involved, and most time-consuming of all. These two traits together explain why data conversion is usually the highest cost of implementing the GIS.
Planning The Data Conversion Process
The data conversion process needs to be planned effectively in order to minimize the chance of data conversion problems which can greatly disrupt the normal workflow of the organization. It is necessary to plan all of the physical processes which will be involved in data conversion and to develop time-estimates for all work. These main processes include:
- Specifications
- Source map preparation
- Document flow control
- Supervision plans
- Problem resolution procedures
These procedures allow for the efficient conversion of mapped data. Guidelines for normal data capture procedures such as scanning and table digitizing should be developed to ensure that all data are consistently digitized. Particularly when an organization is conducting conversion in-house, a small amount of time invested in developing error prevention procedures will greatly benefit the organization by saving time in the correction/editing phase of the conversion. It is easier to prevent errors than to go ahead and try to correct them after the actual digitizing has been conducted.
Data Conversion Specifications: Horizontal And Vertical Control, Projection; Coordinate System, Accuracy Requirements
Any discussion about data conversion should start with the topic of accuracy. We've all heard the expression, "Garbage In, Garbage Out." Without the ability to meet the proper accuracy standards established early in a GIS conversion project, the resulting GIS may be useless based upon its lack of accuracy. Even still, in reality, when building a GIS and handling data conversion, we are faced with a variety of source documents which may each carry a different scale, resolution, quality and level of accuracy. Some source map data may be so questionable that it should not be loaded into the GIS. Extracting reliable data later-on from the GIS will depend upon either the converting of data from reliable source documents, or the development of new data "from scratch."
Map projections affect the way that map features are displayed (as they affect the amount of visual distortion of the map), and the way map coordinates are distributed. Before any GIS graphic data layers will be ready for overlay functions, the layers must be referenced to a common geographic coordinate system. GIS software can display data in any number of projection systems, such as UTM (Universe Transverse Mercator), State Plane Coordinate Systems, and more. For scanned maps and aerial photos (which are simple non-GIS raster images), to be displayed effectively with vector data, the images need to be registered and rectified to the same coordinate system.
Establishing specific requirements for map accuracy should be done at the beginning of a project. If a certain level of accuracy is desired, it is this level which will have to be developed in future aspects of the project. Procedures should be standardized in order to ensure the best and most consistent results possible.
Source Map Preparation (Pre-Digitizing Edits)
Preparing the analog data that will be converted is an important first step. This needs to be done whether the data will be scanned or digitized, and whether you are outsourcing the work or completing it in-house. This pre-processing is also referred to as "scrubbing" the data. The process involves coding the source document using unique IDs and/or using some method to highlight the data that should be captured from these documents. This makes it clear to the person performing the scanning or digitizing what they should be picking up. It will also be important later for performing quality control checks and to make sure that the digital data has a link to the attribute database needed for a GIS.
Document Flow Control
Without a clear system for monitoring and planning the flow of map (and attribute data) documents between the normal storage locations of map documents and those parties handling the actual data conversion, problems will usually arise in tracking the location of maps. When a large number of maps are being converted, it is important to maintain a full understanding between both the conversion contractor or in-house conversion staff, and the normal user group of the source documents about exactly which documents are being handled, and at what time. Source maps are delivered to the conversion group or contractor as a work packet, usually consisting of a manageable number of maps of a certain geographic region, which is pre-determined within the data conversion workplan. A scheme for tracking packets of source documents, as well as the resulting digital files is needed. This scheme should be able to track the digital file through the quality control processes.
In addition to tracking the flow of documents and digital files through the entire data conversion process, a procedure needs to be established for handling updates to the data that occur during the conversion time period. This change control procedure may be quite similar to the final database maintenance plan, however, it must be in place before any of the data conversion processes are started. Also, if this procedure will likely be very different from the previous manual map updating methods used and may involve substantial restructuring of tasks and responsibilities within the organization.
Supervision Plans (Particularly For Contract Conversion)
When planning the data conversion process, it is important that attention be given to the development of detailed plans for supervising the data conversion process. Supervisory plans allow the organization to distribute responsibility for the many different facets of the data conversion project. When data conversion has been contracted out, it is important that communication be maintained between the client company and the contractor. The development of specific variations normal administrative tools used for scheduling and budget control can be very useful (e.g., CPM/PERT scheduling procedures; GANTT charts, etc.)
Problem Resolution Procedures
In order to ensure the efficient progress of all aspects of the data conversion project, it is important to develop formal procedures for problem resolution. Editing procedures and data standards should be developed for such items as: major and minor positional accuracy problems; inaccurate rubber-sheeting, or map-joining/file-matching problems; attribute coding errors, etc. Other procedures for events such as missing source data, handling various scale resolution issues, and even hardware and software system problems should also be created. Establishing such procedures and assigning responsibilities for resolution are extremely important, particularly when outside contractors are involved.
Converting The Data
As stated earlier, it is important to follow consistent pre-established procedures in the actual digitizing of the datasets. Consistently using a tested and approved set of conversion guidelines and procedures will eliminate any chance of ambiguity in methods. Using established procedures will allow for the most consistent product possible.
Reviewing Digital Data
The digital data review process involves three issues:
- data file format and format conversion problems
- data quality questions
- data updating and maintenance
The review process must first be handled before the decision to rely on other digital data sources is made. Additionally, formal data sharing agreements should be made between the two organizations.
Quality Control (Accuracy) Checking Procedures
A quality assurance (QA) program is a crucial aspect of the GIS implementation process. To be successful in developing reliable QA methods, individual tasks must be worked out and documented in detail. Data acceptance criteria is a very important aspect of the conversion program, and can be a complex issue. A full analysis of accuracy and data content needs will facilitate the creation of documentation which may be utilized by the accuracy assessment team.
A combination of automatic and manual data verification procedures is normally found in a complete QA program. The actual process normally involves validation of the data against the source material, evaluation of the data's utility within the database design, and an assessment of the data with regard to the standards established by the organization handling the conversion project. Automated procedures will normally require customized software in order to perform data checks. Most GIS packages today have their own macro programming languages which allow for the creation of customized programs. Some automated QA procedures include: checking that all features are represented according to conversion specifications (e.g., placed in the correct layer); features requiring network connectivity are represented with logical relationships, for example, two different diameters of piping or two different gauges of wire must have a connecting device between them which should be represented by a graphic feature with unique attributes; relationships of connectivity must be maintained between graphic features (Montgomery and Schuch, 143).
Manual quality control procedures normally involve creating and checking edit plots of vector data against source map data. QA requirements which will have to be met include: absolute/relative accuracy of map features should be met and all features specified on the source map should be included on the edit plot; map annotation should be in required format (e.g., correct symbology, font, color, etc.) and text offsets should be within specified distance and of correct orientation; plots of joined datasets should have adequate edge matching capability (M&S, 145).
Final Correction Responsibilities
Quality control editing of the digitized product is a crucial step in preparing spatial feature data. After initially digitizing a data layer, an edit plot is produced of those digitized features. The edit plot is a hard-copy printing of the digitized features. The edit plot is printed at the same scale as the source data and checked by overlaying the plot with the source map on a light table. This edit check allows for the determination of errors such as misaligned or missing features. Corrections may then be made by adding or deleting and re-digitizing features. When on-screen digitizing, feature placement errors may be corrected by "rubbersheeting" the graphic features to fit the source data. Rubbersheeting is the process of stretching graphic features through the establishment of graphic movement "links" with a from-point (where the feature presently is located), and a to-point (where the feature should be placed). GIS graphic manipulation routines then move graphics according to these specified links.
File Matching Procedures (Edge Match, Logical Relationships Within Data, Etc.)
Files which are going to be spatially joined must first have adequate edge-matching alignment of their graphic features. This entails a number of basic GIS graphic manipulation procedures: (1) coordinate transformation, which projects the data layer into its appropriate real-world coordinates; (2) rubbersheeting of the graphic features in one data file to accurately coincide with the adjacent graphic features in another file; (3) spatial joining, the combining of two or more data files into one seamless file spanning the geographic area of all files.
Coordinate transformation is the process of establishing control points upon the digitized layer and defining real-world coordinates for those points. A GIS coordinate transformation routine is then used to transform the coordinates of all features on the data layer based upon those control point coordinates. Once transformed, spatially adjacent data layers may then be displayed simultaneously within their combined geographic extent. A determination may then be made as to the effectiveness and accuracy of the coordinates assigned to the data layers. If necessary, graphic features found in both data layers may be rubbersheeted to better align features which will need to be connected. For example, if the endpoint of a graphic feature representing a street centerline is not reasonably close to its corresponding starting point on the adjacent data layer, one or both of these graphic lines will have to be moved so that the graphic feature will connect. An alignment problem such as this can signal possible errors in the coordinate transformation and/or the source data. After features are accurately matched, the data files may be combined into a single data file. The combined data file will afterwards require editing and the development of new topological relationships in the new dataset. An example of one post-spatial join editing procedure is the removal of graphic line-connection points called "a;nodes" which may interfere with various elements of the attribute database.
Final Acceptance Criteria
Standards for appropriate quality assurance, and accuracy verification procedures in general, depend greatly upon the data sources, the schematics of the database for which data is being prepared, and the actual data conversion approaches applied. Acceptance of the joined digital map files depends upon the data meeting certain criteria. Criteria usually relate to accuracy, such as the determination of whether the product meet National Map Accuracy Standards at the appropriate scale. Other criteria may relate to whether attributes are in order, if they have been added. Most acceptance determinations should be made on whether the feature data is meeting standards of accuracy, completeness, topological consistency, and attribute data content.
Building Main Database
One of the final stages involved in developing a GIS database involves putting all the converted data together. Establishing one uniform database involves entering all attribute and feature data into a common database with an established workable file/directory structure, sometimes known as a "data library." As the database is developed and data is ready for use, it can be released to the various data users for analysis. Once the database is designed, it then becomes important to maintain data accuracy and currency. If changes are made within the confines of the data layers, these changes must be defined and updates made to keep the integrity of the database. Subsequent guideline documents deal with data integration and database maintenance.
Source Documents
There are a number of source documents which can be utilized as data for the attribute database. Many organizations are able to utilize their existing electronic database files and import this data directly into their GIS database. In the case of paper files relating to geographic areas, and attribute data existing on paper maps, this data will have to be manually entered into GIS attribute data files in the form of tables. Before this information is entered into a database, it must first be reviewed and edited. It is also important to have a procedural plan designed for the entry of this data in order to coordinate the flow of these source documents.
Pre-Entry Checking And Editing
A review of GIS attribute source documents can oftentimes reveal an unorganized mass of maps, charts, tables, spreadsheets, and various textual documents. The checking and editing of source documents is handled in the scrubbing phase of the project. Without a specific plan designed for the entry of these various data elements, it is highly likely that error will be introduced into the GIS database. It is crucial that all source documents are readable and properly formatted to allow for the most efficient entry of numerical and textual data. If the database conversion is being outsourced, and the contractor is unable to read the source data, the resulting database will be inaccurate more costly, or both. It is recommended that a formal scrub manual, designed according to the database and application requirements, be developed to help facilitate the supplementing of source data and its entry into the database. Logical consistency is an important element for both graphic and attribute elements. Records and attributes which are related to graphic elements within a network system must maintain logical relationships.
Document Flow Control
An organization will typically have a multitude of different document formats which it will need in coding all of its GIS attribute data. It is crucial that tracking mechanisms be implemented in preparation for the key entry process. Usually duplication of source documents which will be used in the key entry process will not be feasible. As many source documents to be key entered are used on a regular basis within the organization, it will be important to develop guidelines for tracking these documents if they are needed during the process. Timing and coordination will be factors in planning document usage.
Key Entry Process
As stated earlier, some organizations will be able to enter much tabular data into the database simply by way of importing existing tables or files into the GIS, or relating tables which exist in their external DBMS. Normally, it will be necessary to enter attribute data into the system utilizing a keyboard. Many organizations choose to use lists when entering data from the keyboard. It is much more efficient during conversion to enter a 2 or 3-digit code which has a reference list associated with it. Typing in a full description of the graphic into the text field takes longer, and increases the chance of typographical error.
Digital File Flow Control
Numerous files will result from the key entry process. These files will need to be given proper names and directory locations in order to track and prepare the data logically for use within the GIS.
Quality Control Procedures
Most databases allow the user to specify the type of field for each data element, whether it is numeric, alphanumeric date, etc.; whether it has decimal places, and so on. This feature can help prevent mistakes as the system will not allow entries other than those specified in advance.
There are a number of automated and manual procedures which can be performed to check the quality of attribute data. Some customized programs may be required for the testing of some quality control criteria. Some attribute value validity checks which may be performed include: verifying that each record represents a graphic feature in the database, verifying that each feature has a tabular record with attributes associated with it, determining if all attribute records are correct, and determining that all attributes calculated from certain applications must be correct based upon the input values and the corresponding formulas. The translation of obsolete record symbology into a GIS usable format, according to conversion specifications, is one procedure which will have to be conducted manually (Montgomery and Schuch, 145).
The responsibility for checking and maintaining automated quality control procedures can be placed in the hands of the staff responsible for actual data conversion. When outsourcing data conversion, one of the most time-consuming aspects of the project is the evaluation of converted data once it has been received from the vendor. Usually, automated routines are developed which can be utilized in the evaluation of the datasets, and in determining if the data fulfills all of the requirements and standards stated in the contract. This process can be simplified by the client company delivering automated quality control checking routines to the data conversion vendor. The vendor is then able to run these routines, evaluate and edit the data so that it will meet requirements before it is even shipped to the client. Such a procedure saves valuable time and expenses which would otherwise have been spent on quality control evaluation, shipping and business communication.
Change Control
Final editing procedures and data acceptance are based upon whether major revisions in the data will need to be performed. After data verification and quality assurance checks, it may be necessary to again re-evaluate database design, technical specifications of the data, and conversion procedures overall. Ideally, the planning and design of the database will be sufficiently comprehensive and correct such that the logical/physical database design will not have to be modified. However, it is rare that a data conversion project will be able to push through to completion without some changes being necessary. Many conversion projects develop procedures which are used to identify, evaluate and then to approve or disapprove the final products. A form should be developed which is used to list desired changes which have been identified. The listing of desired changes is then evaluated in terms of both the volume of the data which has yet to be edited, and the amount of data which has already been converted. The conversion vendor will usually develop documentation which describes the estimated cost/savings which will be associated with the changes and final edits. Most organizations now accept the fact that changes will be a normal part of data conversion and change requests are usually expected. The challenge then lies in the methods by which change mechanisms are developed and agreed upon between client and vendor.
Final Acceptance Criteria
Acceptance criteria are the measures of data quality which are used to determine if the data conversion work has been performed according to requirements specified. In the case of outsourcing of conversion, these criteria will determine if the data has been prepared according to the contract specifications. If the data does not meet these specifications, the conversion contractor will be required to perform any necessary editing upon the data to reach acceptable standards. Acceptance criteria and standards may vary between organizations.
File Matching And Linking
In most GIS packages which utilize relational database technology, the file matching and linking is a fairly simple process. Most GIS packages contain straight-forward procedures for joining and relating attribute files, which normally entails the selection of the unique identifying key between the graphic feature attribute table and any other data attribute tables. Once the identifier-link has been specified, the GIS software automatically establishes the relationship between the tables, and maintains the relationship between them.
Sources Of Digital Data
Digital spatial and attribute data can be found from a variety of sources. Various companies today produce "canned" digital spatial datasets which are ready for use within a GIS environment. Utilizing an existing database is a good way to supplement data in the conversion process and is one of the best ways to save money on the cost of producing a database. Most federal, state, and local government agencies have data which is available to the public for minimal cost.
Two of the largest spatial databases which are national in coverage include the US. Geological Survey's DLG (Digital Line Graph) database, and the U.S. Census Bureau's TIGER (Topologically Integrated Geographic Encoding and Referencing) database. Both systems contain vector data with point, line and area cartographic map features, and also have attribute data associated with these features. The TIGER database is particularly useful in that its attribute data also contains valuable Bureau of the Census demographic data which is associated with block groups and census tracts. This data is used today in a variety of analysis applications. Many companies have refined various government datasets, including TIGER, and these datasets offer various enhancements in their attribute characteristics, which increases the utility of the data. Unfortunately, problems associated with the positional accuracy of these datasets usually remain and are much more difficult to resolve.
Satellite and digital orthophoto imagery, raster GIS datasets, and tabular datasets are also available from various data producing companies and government agencies.
Transfer Specifications
Many government agencies produce spatial data which is in its own unique format. Many full-feature GIS packages have the ability to import government spatial datasets into data layers which are usable within their own environment. Some agencies or companies may produce their data in the most common data formats for government data in the transfer of their data (e.g. TIGER or DLG format). Such policies allow for easy transfer to various systems.
Quality Control Checks
Quality control checks on external datasets will be necessary. Many government datasets, although extensive in their geographic coverage and in the utility of the associated data, do not always have the most accurate or complete data, particularly in terms of positional accuracy. It is always advisable to be skeptical of a dataset's accuracy statement and compliance with standards and to fully test and evaluate the data before purchasing it or incorporating it into the database. Various automated and manual quality control procedures, discussed for both assessing cartographic feature and attribute characteristics should be utilized in a quality assurance evaluation of the external data.
8. ACCURACY AND FINAL ACCEPTANCE CRITERIA
Acceptance criteria determine to what standards data must comply in order to be usable within the system. Graphic acceptance standards for external digital data may be identified in three different cartographic quality types which include: relative accuracy, absolute accuracy and graphic quality. Standards for GIS data will normally depend upon the accuracy required of the dataset. In the GIS environment, accuracy will depend upon the scale at which the data is digitized, and at which scale it is meant to be used.
- Relative accuracy is basically a measure of the normal deviation between two objects on a map and is normally described in terms of + or - the number of measurement units (normally inches or feet) the feature is located apart from its neighboring map features, as compared to their locations in the real-world.
- Absolute accuracy criteria will evaluate the measure of the maximum deviation between the location of the digital map feature and its location in the real-world. Many organizations set their absolute accuracy standards based upon National Map Accuracy Standards.
- Graphic Quality refers to the visual cartographic display quality of the data, and pertains to aspects such as the data's legibility on the display, the logical consistency of map graphic representations, and adherence to common graphic standards. Placement and legibility of annotation, linework, and other common map elements all fall under graphic quality.
Informational quality is another accuracy criteria component which should be given much attention in building a database. Informational quality relates to the level of accuracy for both map graphic features and to their corresponding tabular attribute data. There are four basic categories for assessing these qualities:
- completeness
- correctness
- timeliness
- integrity
Together, these aspects of informational quality comprise the extent to which the dataset will meet the basic requirements for data conversion acceptance.
Completeness is an assessment of the dataset's existing features against what should currently be located within the dataset. Completeness may relate to a number of digital map features: annotation symbols, textual annotation, linework. Completeness will also relate to the attribute data, and whether all of the necessary attributes are accounted for. A typical requirement for the bottom limit of dataset completeness, when outsourcing conversion, is that not more than 1% of the required features and attributes will be missing from the digital dataset. For example, out of 80 roads that are located within a geographic area, if only 72 are included on the map, then only 90% of the data is included, and thus the map is only 90% complete.
Correctness is that quality which relates to the truth and full knowledge of the information contained. If a map shows a number of roads, and the linework is positioned correctly, but is not labeled correctly, there is a problem with correctness. Correctness applies both to map features and to attribute data. If a dataset has the positional accuracy, or the completeness in terms of placing an object, but does not have the correct label for that object, this is a problem with the correctness of the dataset. Evaluating correctness can be done through automated or manual procedures. Validation procedures are those which would be utilized in the testing of the datasets. An example of assessing correctness might include the matching of one dataset source against another to check for data accuracy from the various matching qualities. Every graphic and database feature has the potential for error.
Timeliness is another measure of informational quality, and it is a unique form of correctness. Timeliness is based upon the currency of a dataset, and if it is not up-to-date, or current, then the dataset must be of a specified age. The timeliness of a dataset begins from the date the dataset arrives at the client's door. From that point on, it is the responsibility of the client organization to maintain the data, and its currency.
The integrity of a dataset is a measure of its utility. Graphically, database integrity means that the dataset is maintaining its connectivity and topological consistency. In it, all lines are connected, there are no line overshoots or undershoots, and all feature on the display are representative of real-world features. In order to maintain database integrity, there should not be any missing or duplicate records or features.

