Development of an automated climatic data scraping, filtering and display system
Academic Article
Overview
Research
Identity
Additional Document Info
Other
View All
Overview
abstract
One of the many challenges facing scientists who conduct simulation and analysis of biological systems is the ability to dynamically access spatially referenced climatic, soil and cropland data. Over the past several years, we have developed an Integrated Agricultural Information and Management System (iAIMS), which consists of foundation class climatic, soil and cropland databases. These databases serve as a foundation to develop applications that address different aspects of cropping systems performance and management. In this paper we present the processes and approaches involved in the development of a climatic data system designed to automatically fetch data from different web sources, consolidate the data into a centralized database, and delivery the data through a web-based interface. Climatic data are usually available via web pages or FTP sites. The exact steps to scrape data from different sources vary depending on how the data are rendered. The climatic data building process presented herein is broken down into 5 major program modules, corresponding to different phases of the process: Data Requester, Data Fetcher, Data Parser, Data Filter, and Data Explorer. The Data Requester is responsible for processing the web pages that lead to the determination of the requested weather data. The Data Fetcher is responsible for fetching weather data that is made available by the data sources based on the request from the Data Requester. The Data Parser is responsible for decompressing and parsing the contents of the original data file and saving the data to an SQL Server 2005 database. The Data Filter is responsible for data quality control and for estimating missing data and saving the filtered data. The Data Explorer is designed to provide web-based user access to the consolidated and filtered climatic data using both dropdown lists and map-based navigation. Three types of data are stored in the process: original climatic data in file format, parsed climatic data in SQL Server database, and filtered climatic data in SQL Server database. The resulting consolidated and filtered climatic database provides a common foundation that allows us to develop diversified applications that require dynamic access to near real-time data. A number of applications have been and are being developed that seamlessly access the foundation class climatic database. Collectively these applications address water conservation, crop production and management, land use suitability analysis, and bioenergy refinery site selection. 2009 Elsevier B.V. All rights reserved.