You are here :   Home Page > News > R7277 - A 2 day workshop to consider archival policy and practice for historic and current tropical forest inventory data. Final Report.
FRP Forestry Research Programme
Home Page About the FRP News Current Call Projects FRP Structure Contact Us Site Map
Project Search  |  Advanced
 
  Low Graphics          Document Archive
 
R7277 - A 2 DAY WORKSHOP TO CONSIDER ARCHIVAL POLICY AND PRACTICE FOR HISTORIC AND CURRENT TROPICAL FOREST INVENTORY DATA. FINAL REPORT.

FINAL REPORT OF WORKSHOP A two-day workshop to consider archival policy and practice for historic and current tropical forest inventory data.

This publication is an output from a research project funded by the United Kingdom Department for International Development (DFID) for the benefit of developing countries. The views expressed are not necessarily the views of DFID. R7277 Forestry Research Programme

Date of revision: 13 July 2000 Report compiled by: Nell Baker - 7 Newman Road, Littlemore Oxford, OX4 3UJ, UK Tel: +44 1865 777 223, Fax: +44 1865 271 036

Table of Contents

Introduction/Background Aims of the Workshop Structure of meeting

Annexes Annex 1 Agenda Annex 2 Database structure Acknowledgements Acronyms List of participants

Papers presented ATROFI database Web page What are archives and why do they matter Dataset Archiving and Data Services at the National Data Repository Data management and archiving, the Statistical Services Centre experience Inventory and Data Retrieval - the view from NRI Archiving Census and Woodland Inventory Data Maintaining Forest Data for Future Use - A Commercial Perspective Volume/Biomass: Geo-referenced Forest Volume Data for Tropical Countries Managing environmental data Discussions The ATROFI database Policy Conclusions

Top of page

ACKNOWLEDGEMENTS

The Forest Research Programme of DFID is thanked for providing funding for the project and for the workshop. The Statistical Services Centre, University of Reading is thanked for providing the venue for the meeting and administrative support. Howard Wright is thanked for chairing the meeting. Sandro Leidi, Lorna Turner and Kelly Watkins are thanked for co-ordinating the meeting at Reading. Nell Baker is thanked for organising the workshop and for reporting on and editing the proceedings.

Back to contents page

ACRONYMS

AONB

Area of Outstanding Natural Beauty (UK)

ATROFI

Archive of TROpical Forest Inventory

CAIRS

Computerised Agricultural Information Retrieval System (NRI)

CCT

Computer Compatible Tape

CEH

Centre for Ecology and Hydrology (NERC)

CIFOR

Centre for International Forestry Research (Indonesia)

COPR

Centre for Overseas Pest Research (now part of NRI)

DANI

Department of Agriculture Northern Ireland

DAT

Digital Audio Tape

DFID

Department for International Development (UK)

ECN

Environmental Change Network (NERC UK)

EIC

Environmental Information Centre, Monks Wood (NERC)

EU

European Union

FAO

Food and Agriculture Organisation (of the United Nations)

FC

Forestry Commission (UK)

FE

Forest Enterprise (UK)

FIADRS

Forestry Inventory and Analysis Database Retrieval System (US Forest Service, USA)

FRP

Forest Research Programme (DFID UK)

GIS

Geographic Information System

HTS

Hunting Technical Services (UK)

ICRAF

International Centre for Research in Agroforestry (Kenya)

ICRISAT

International Centre for Research in the Semi-Arid Tropics (India)

 

 

IPR

Intellectual Property Rights

IRRI

International Rice Research Institute

ISO

International Standards Organisation

ITTO

International Tropical Timber Organisation (Japan)

IUCN

International Union for the Conservation of Nature

LRDC

Land Resources Development Centre (UK)

NP

National Park

NDAD

UK National Digital Archive of Datasets

NDR

National Data Repository (UK)

NERC

Natural Environment Research Council (UK)

NRI

Natural Resources Institute (UK)

ODA

Overseas Development Administration (now DFID)

OFI

Oxford Forestry Institute (UK)

PRO

Public Records Office of England and Wales

PSP

Permanent Sample Plot

RCHM

Royal Commission on the Historical Monuments

SCDB

Subcompartment database (UK Forestry Commission)

SSC

Statistical Services Centre (Reading University, UK)

SSSI

Site of Special Scientific Interest (UK)

TPO

Tree Preservation Order

TPI

Tropical Products Institute (now part of NRI)

TRADIS

Tropical Research and Agricultural Development and Information System

TROPIS

Tree growth and permanent plot information system

TSP

Temporary Sample Plot

ULCC

University of London Computer Centre (UK)

UoG

University of Greenwich (UK)

UNEP

United Nations Environment Programme

WGS

Woodland Grant Scheme (UK)

Back to Contents page

LIST OF PARTICIPANTS

Trevor Abell Natural Resources Institute Kevin Ashley National Digital Archive of Databases Mark Atkinson Web page Consultant Allesandro Baccini FRA2000 Consultant Nell Baker Oxford Forestry Institute Graham Bull Woodland Surveys, Forestry Commission Eberhart Bruenig Forestry Consultant Melvin Cannell Institute of Terrestrial Ecology, NERC Henry Coleman Timber Export Development Board, Ghana

Geoff Collett Environmental Information System, Monks Wood (NERC) Ian Dale Statistical Services Centre, Reading University Janet Foster Archive Consultant John Healey School of Agricultural and Forest Sciences, Bangor University Brian Kerr Commonwealth Secretariat Sandro Leidi Statistical Services Centre, Reading University Paul Phillips Institute of Ecology and Resource Management, Edinburgh UniversityBarbara Pickersgill Plant Sciences, University of Reading Andy Roby Department for International Development Mike Roper Association of Commonwealth Archivists and Records Managers Julie Smith Oxford Forestry Institute Paul Smyth Huntings Technical Services Roger Stern Statistical Services Centre, Reading University Jenny Wong Forestry Consultant Howard Wright Oxford Forestry Institute Ma Xiangquing Chinese Researcher based in IERM, Edinburgh University Back to Contents page

WORKSHOP REPORT

INTRODUCTION/BACKGROUND

Current concerns about global warming, loss of global biodiversity and deforestation have thrown into sharp focus the role of tropical forests as a storehouse of carbon and biodiversity and as a potentially sustainable source of timber and other products. However, information about the biomass of tropical forests, their species composition and their productivity, both before and after human impact, is still very fragmented. Such information is needed to prioritise conservation initiatives, as a baseline against which to assess subsequent change, as a means of predicting future changes, as a target for restoration, and as an indicator of best management practices.

Over the years a considerable amount of inventory, growth and yield data from tropical forests have been accumulated in the UK by individuals and various organisations, both private and public. Many of these data are now no longer available in their country of origin. The archiving of these records has often been poor and in many cases has depended on the interest of a single person. There is thus a very real danger that this valuable information may be lost. There is clearly a need to establish a new archiving system to meet modern information requirements; this is the focus of a current DFID Forestry Research Programme project (R7277 Documentation of UK holdings of growth and yield, inventory and other data from tropical forests). On 30 and 31 March 2000 a project workshop was held at the Statistical Services Centre, University of Reading. The following is a record of the proceedings of this workshop.

Back to Contents page

AIMS OF THE WORKSHOP

The aim of the workshop was twofold. Firstly, to publicise the database that has been created to store information about existing data holdings in the UK (ATROFI-UK). Secondly, to reach a consensus on the main points to be included in a realistic archival policy and practice for such data.

Back to Contents page

STRUCTURE OF MEETING

The workshop was chaired by Howard Wright. An introduction to the project was made followed by presentations of the database and the web page. Invited papers were presented indicating existing archive systems in the UK. Discussions were then held covering two main topics. Firstly regarding the future of the database and, secondly, regarding future policy in the UK for tropical forest inventory data archival (see agenda in Annex 1).

Back to Contents page

4.4 PAPERS PRESENTED

ATROFI database Sandro Leidi - Statistical Services Centre, Reading University

Prior to designing the database a survey was undertaken of existing databases that might have a similar purpose. Based on this survey, it was decided that the database format should have a very similar structure to TROPIS (Tree Growth and Permanent Plot Information System), using Access as an application to set up the database structure. Fields should be all those included in TROPIS plus additional ones as the information collated by the project comes from different types of inventory, not only Permanent Sample Plot (PSP) inventory. Unlike TROPIS, however, ATROFI does not always contain individual plot information.

The database was named ATROFI-UK as it represents an Archive of TROpical Forest Inventory for the UK. ATROFI-UK, being a meta-database, does not contain raw data but rather a summary of what is in the listed datasets and, most importantly, a contact address of the holder of the raw data. The purpose of a meta-database is to publicise and share the whereabouts and availability of datasets across institutions.

The structure of the database consists of 13 tables: eight main tables containing the information collected in the questionnaires, three junction tables to represent many-to-many relationships and two lookup tables (See Database Structure for a diagram of the structure of the database).

There are eight main tables in the database, centred around the core Study table. This table then links to three different study type tables: Permanent Sample Plots, Natural Forest Inventories and Volume Functions. These subset tables were necessary given that the information requested for each study type differs substantially. The

questionnaire layout is also different for each of the three study types.

To preserve referential integrity and so avoid duplicates, a record cannot be entered in the study type subset tables if it does not already exist in the main Study table.

The Person table contains the names, details and address, when known, of anyone connected with a study. Also the person's role is indicated, whether informant / holder, owner of the data, principal investigator or Intellectual Property Rights contact.

The Holding table is separate from the study table, as there may be more than one holding of the same data set within the UK. It is linked to the Study table by a many-to-one link via the Person_Study junction table. The Holding table describes the physical location of records in the UK and their archiving status.

The Publications table contains references, whether published or not, that describe any aspect of a study. For example, the protocols used or the methodology in collection of volume data.

The VolumeFunctions table contains details of each volume function derived. This table is also a junction table because a Volume study can contain functions for many species and the same species can be studied in many Volume studies.

The Country table is a lookup table containing the 235 countries with the standard two-letter abbreviation from the ISO (International Standards Organisation) list. The other lookup table is the Species table, which has over 4,000 botanical names of tropical tree species, including synonyms.

Three junction tables define the many-to-many relationships between pairs of tables:* Study_Species because a study can cover several species and a species may be studied in several studies; * Person_Study because the same person can be involved in several studies (in the same or in a different role) and a study can have many people connected with it; * Study_Publication because the same study can be mentioned in many publications and several studies can be described in the same publication.

The relationships between these tables are shown in Database Structure . Each box represents a table and shows some of the fields. A line linking two boxes represents a relationship. Fields in bold denote the "key fields" that maintain the integrity of the relationships. At each end of the relationship line is a symbol: "1" indicates this table is at the "one" end of a one-to-many relationship; "(" indicates the table is at the "many" end.

It was decided that access to the database would be primarily online. A rather rigid search system was proposed similar to that in place for the ECN (Environmental Change Network) and FIADRS (Forestry Inventory and Analysis Database Retrieval System of the US Forest Service) database retrieval systems. It is a very efficient method for information retrieval as it forces the user to search within what is held in the database and not to specify a vague query that may yield no result.

Back to Contents page

Web page Mark Atkinson - consultant

In December 1999 a dot com domain name was registered for the project and a home page set up. This had the address http://www.atrofi-uk.com . The ATROFI web site shows a summary of the project and its aims. The ATROFI database can also be searched on the web site. The interactive database operates on a series of text files exported from the Access database. These files are searched and displayed using Perl scripts.

Searches can be done separately for each of the three study types or all records can be viewed for each study type. Alternatively, selections can be restricted to the country where the study was carried out, by vegetation type and/or by the name of any person involved in such a study.

The searching is carried out in a number of stages: 1. A first search that gives a listing, for the study type selected, of study title, country and vegetation type. Clicking on a specific study leads to: 2. A summary that characterises the study: its design and protocol, geographical location, implementing institution, scope and year of implementation. Two further options are then given: 3. It is possible to retrieve detailed information about the archival condition of the dataset and, secondly, the contact details of the data holder, to whom questions concerning the release/use of such data should be directed.

The search facility on the web page does not link directly to the ACCESS database. A series of text files is created from the ACCESS database and placed on the server. These files are searched using a search engine written in the Perl language.

The project home page also includes questionnaires to elicit information from interested parties about what type of data they would find useful and what they would use it for. Another questionnaire is provided for users to give details of new data sets.

Back to Contents page

What are archives and why do they matter Janet Foster - Freelance Archives Consultant

From our brief discussion this morning it became apparent to me that you had already addressed many of the issues that I intended to cover. For example, you appear to be well aware of the importance of provenance, i.e. where the data came from, who collected it and when. But I will try to cover, from an archivist's point of view, the principles underlying the way archives and records are managed.

Firstly what are archives? And what is an archivist? Archives are records and documents created in the normal course of the life of an institution, family or individual, regardless of medium, which have been selected for permanent preservation due to their continuing value as primary source material. The creator of the records, whether organisation or individual, is also known as the provenance of the archive and knowledge of how, why and by whom the records were created provides the important context for the use and interpretation of them. As such archives are unique.

An archive may also refer to the physical repository; this is where material is processed, stored and where access to the material is provided. Despite long resistance, 'to archive' is now an accepted term and thus the word archive has also become accepted as a verb. In everyday language archive has come to mean anything that one has put away and is not using at the moment. For example, computer files are referred to as having been "archived" when they have been removed from the hard drive, but the intention here is usually to clear space and not to consider whether the files need to be kept. In the archivist's view records have not been "archived" until the decision that they are worthy of permanent preservation has been made and they have been transferred to the place where they will be permanently stored. In order for data to be archived in the official sense some appraisal of its permanent value has to have occurred. Assessing the data and making a decision about whether or not it should be kept is also part of the job of an archivist.

This appraisal process is part of records management and happens before records are archived. Note that all archives are records but not all records are archives. You may wish to bring this to bear on the data sets that you are looking at keeping.

Appraisal is the most difficult part of the archivist's job. The usual procedure is to devise retention schedules for the different series of records. This is done in consultation with the people who have created the records and involves deciding how long a record is current, how long it will be semi-current, i.e. its immediate purpose is passed but it may need to be referred to, and then when it can be disposed of. A record may be destroyed after this time or it may be considered worthy of permanent preservation as an archive.

The primary issue in making a decision about whether a record is an archive is evidential value. The archivist attempts to identify those records that will need to be kept to provide evidence of what was done. The second characteristic that is considered is the information value of the material, i.e. whether or not a record is of wider interest beyond it original use. These two provide the basis for the appraisal of records. From then one can go on to devise a summary of criteria for retention. These points apply equally to paper records and datasets. In assessing records for archival value it is paramount that archivists are objective.

An archive should contain records that are rich, concise and limited in quantity. If it is not clear what a record is then it is useless, for example a photograph may be an interesting image but if there is no information about who or what is depicted then it's use as a record is very much reduced. Also content is more important than age and there will be records created today which are archives, such as the record of decisions which you make today about the future of this enterprise, whereas routine

correspondence which that might be 50 or even 100 years old need not be archived. However, we tend not to throw away anything prior to 1850!

Archivists give preference to records that provide summarised information. If there is no other use of the supporting data then it is only necessary to keep the summary, for example keep the annual accounts but not all the invoices, vouchers etc which provided the information. But here one must be careful not to confuse summarised information with analysed information. For example, just because the history of an organisation has been written it does not follow that all the records of that organisation can be thrown away. There will be further uses that can be made of them and different interpretations will produce different histories. One written history does not contain all the evidence and the records should have been kept.

In addition, archivists have to be careful not to concentrate on current research trends, there is a need to be objective and take a wider view of the secondary use that might be made. As it is not possible to predict where research is to go it is hard to make a decision about what might or might not be useful. There are many examples of a particular set of records being used for research which would not have been thought of at the time the records were archived.

Archivists also involve themselves in helping the creators of records to decide what might have value in the future. In the Qualidata Centre at the University of Essex I was involved in an effort to search out and save qualitative data arising from social science research. In addition, I advised researchers at the beginning of their projects on how to identify whether their data would merit permanent preservation and, if so, how to prepare it for archiving during the project.

As I mentioned before the provenance provides the context of the data i.e. where, when, and how it was produced as well as any sources of existing information about it and, most importantly, why the particular project was undertaken. The question we seem to be asking here is how can archival consciousness be raised in the organisations that are contracting the work that produces the datasets you wish to preserve. Your worries seem to be that existing, valuable datasets may not survive.

Back to Contents page

Dataset Archiving and Data Services at the National Data Repository Kevin Ashley - UK National Digital Archive of Datasets

SUMMARY

I have been asked to speak to you about how we go about preserving databases in a digital archive. To that end, it will be worth defining what we mean by a few of these terms so that we can be sure we are all speaking about the same thing. Then I will describe the role of the organisations I am involved with in digital preservation, i.e. what we do for the government and what we do for others. I will finish by summarising the essential information and tools that we need in order to do our job. Whoever takes on the task of preserving your database will have similar

requirements to ours, i.e. these requirements are not specific to how we go about our job.

WHAT IS AN ARCHIVE?

When those who work in computing talk about 'archives', it's often not easy to be sure exactly what they mean. The term has been used very loosely. Sometimes it refers to any means of storing large amounts of data, such as a tape robot. Often it means some types of data that are more awkward to get at than others, such as tapes in an off-site storage silo. In other cases, it means data that's more than three months old which has been moved automatically to some lower-cost storage medium. The term is rarely used to reflect genuine long-term preservation with some hope of the contents being comprehensible to someone other than its creators, but that is the sense in which I think we are using it today: 'archive' as it is understood by an archivist.

Janet Foster has already given you a good insight into the essential attributes of an archive, and to the role of the archivist. The key aspects I would like to bring out are that the material in an archive usually was not intended for publication: letters and minutes rather than books and pamphlets. The archive takes on the role of preserving and describing the material and making arrangements for access to it. The selection of what ends up in an archive is important, as typically archives seek to preserve material permanently, not for 5 years, or 50 years, but forever. All these points are as true of digital archives as they are of paper, whether we are dealing with databases or documents.

WHAT IS PRESERVATION?

Preservation, however, in a digital archive is different. With paper and parchment our concern is with the physical artefact as well as the information on it. Although we can make copies on microfilm or paper, they are seen to be inferior to the original in some way. Nonetheless, the methods of preserving paper are well understood. They involve correct environmental conditions, appropriate handling and sometimes chemical treatment of the paper to reduce acid content.

With digital information, our role is somewhat different. There is no concern here about the 'original' entity in any physical sense. A digital copy is identical to its original in any meaningful way. We are concerned about preserving the information content and about ensuring that the information continues to be accessible. This may mean altering it in some way (by changing the storage format) because of changes in technology or software. We need to decide what attributes of the information are immutable and worthy of preservation and which are accidents of technology and may be changed. For instance, the exact storage medium we use is typically not a factor that needs to be preserved. Whether your database is on a CD or a floppy is a matter of convenience, nothing else.

DIGITAL PRESRVATION PROBLEMS

Preservation is hence not as simple as it is in the paper world. The media used are not long lasting, and the methods used to store information change. We cannot examine

the contents of a digital archive without an intermediary (hardware and software) to allow us to interpret it. As technology improves, people want information delivered to them in different ways.

We also need to take additional steps to ensure information is not intentionally or accidentally altered. Computer files are usually much easier to change without leaving visible evidence than is the case with paper documents. We can protect against this, but it requires specialist knowledge and techniques to do it. One of the most pressing problems is that context is quickly lost, since it often only exists in the minds of the people who created the information in the first place. Key attributes of the data, that are required in order to interpret it correctly, are often not recorded with the data if it was initially created for use by one person, or one research group.

DIGITAL PRESERVATION ADVANTAGES

On the other hand, digital archives do have advantages. We can copy material easily and cheaply, and the copies are as good as the originals. We can easily provide multiple ways to access the same archival material, suitable for different audiences. We can protect our original material against inadvertent or malicious damage more easily. We can provide access to researchers worldwide, without the need for them to visit us. We can provide very fine-grained control over who can access what parts of the archive, perhaps releasing different fields of a database, or different rows, to different people, without the need for manual methods to sift the material as is necessary with paper. We can automate the checking of our archive against decay, and we can easily represent complex inter-relationships between material in a database. All of these things are either impossible with paper, very difficult or very expensive.

WHAT IS NDR?

So why am I able to talk about this? I am responsible for managing the National Data Repository (NDR) at the University of London Computer Centre (ULCC), which is a single physical repository for large amounts of data from many different clients, along with the expertise to preserve it and provide access to it. We run a number of different services from this, with differing processes for taking material in and releasing it, different costs and different consumers for each service. Our work ranges from providing a simple 'safe-deposit-box', where we store data for an organisation without any knowledge of what the data is, and without providing access to anyone but the owner, through to a full archive service in which we catalogue the material, provide access to the public and provide support for researchers. All of this is operated on a cost-recovery basis.

THE WORK OF NDR

Tasks undertaken by NDR include: * The conversion of material to standard forms, from proprietary ones. * Migration to new media or new formats over time. * The provision of controllable access, and auditable access, so that specific groups can be authorised to get at specific material. * The provision of search facilities and their integration with other search portals and

resource discovery networks (as they are now known). * The cataloguing and/or indexing of material. * The gathering of contextual information about material. * The provision of user support to researchers wanting to access archive material.

I would stress that any or all of these tasks could be undertaken by the original owners of the material, or some other group with an interest in it. We can offer all of this, but we can be involved in only part of it if someone else undertakes to do the rest better or more cost-effectively.

WHAT IS NDAD?

NDAD (the national digital archive of datasets) is the largest single client for the NDR. It is operated for the Public Records Office of England and Wales (the PRO) under a private finance initiative contract, the 'private finance' in this case being the University's own reserves. Our task is to deal with public records which take the form of datasets of one sort or another. The PRO and government departments are responsible for selecting what material is to be preserved (only 2% of paper records are preserved on average, the rest being destroyed at some point in their lifetime). Our role begins once the decision is taken to preserve a specific system: we deal with acquisition, conversion, cataloguing and everything through to provision of public access. We deal both with material that is available for reading by all and with highly confidential material which will not be open to the public for anything up to 100 years from now. Although our primary source material (the databases) is digital, we have to deal with a lot of paper as well, as much of the essential contextual documentation only exists on paper. We are dealing with computer systems old and new, from 1960 to the present day.

Amongst the information we look for is what might be described as micro meta-data: information describing, at the lowest level, the data types of individual fields within a dataset, the descriptions of these fields, any ranges or other restrictions that apply to their values, and so on. In many cases, this information is embedded in the database or application itself. In some cases, particularly where bespoke systems are involved, the answer may lie in the source code of the application.

META DATA

We also look for technical meta-data on a broader level, describing the inter-relationships between the elements of the dataset and the capabilities of the system used to process it. This can include information on what methods could be used to retrieve data (use of key fields, soundex name searches, and so on) and what reporting capabilities existed. Information on provenance and use is also sought, and on policy matters relating to the establishment of the system and what effects its use might have had.

The sources for the metadata are varied. Modern databases may have much of the metadata embedded inside the system, but more often we are looking at supplier and user documentation, internal organisational records, publications, our own specialist knowledge and oral history collected from those who created or used the data.

DIGITAL ARCHIVAL COSTS

I've been asked to say something about the cost factors involved in digital archival work. It is a very complex subject and one I will confess I still don't fully understand myself. However, amongst the factors that certainly have an influence are these: * Do resources arrive in neat bundles? * Is metadata attached to the data when it arrives? * What are the access patterns: frequent or infrequent? * Is contemporary knowledge of the data available? * Size of the user base and the number of accessions * The level of support required for depositors and users

What generally does not influence costs is the raw amount of data. For us that is about 1% of our total operating costs. To model any of these you need to start with a service model of what you are trying to do, and who you are doing it for.

NDAD SKILLS

The skills we have available in NDAD to do our work are varied. We have professional archivists working alongside specialists in databases and their uses. These are backed up by systems specialists who ensure data is kept safe and secure in systems that work 24 hours a day, and user support staff who deal with research queries. Our aim is not simply to preserve a set of numbers, but to document how the information was used, why it was collected, what influence it had and the context in which all of this took place. The information we gather is thus both historical and technical in nature.

As I said earlier, one can conceive of ways in which some of this work is done elsewhere by other groups, perhaps with a greater specialist knowledge about some of the archive's holdings. The possibilities are endless. Technology is certainly not the constraint any more in dealing with the archiving of computer-based material. The barriers facing us are more often organisational and motivational, and funding, as always, can also be a concern.

Back to Contents page

Data management and archiving, the Statistical Services Centre experience Roger Stern - Statistical Services Centre (SSC), Reading University

INTRODUCTION

The problems of management and archival in the forestry sector are also experienced in other areas. Examples include agroforestry and agricultural research data and the archival of climatic data. All countries routinely collect climatic data and the Meteorological Services try to provide these data for a wide range of users. The World Meteorological Office (WMO) has recognised the complexity of the task facing many countries and has developed a system called CLICOM, that has been

provided since the mid-1980's.

DATA MANAGEMENT

In considering data management there are three main topics to be considered: * An inventory of data that are available. This is being covered by the current project which is creating a database of the meta-data. * The rescue of the historic data from past projects. Thisis not covered by the current project. It is important, but is not an easy task. An example of this type of work is the DARE (Data Rescue) project for climatic data that produced microfiche records of climatic data. * The management and archiving of current and future project data. The preparation of recommendations on this topic is the second theme of this workshop.

For current and future projects there are two separate problems. First, is the technical problem of devising an efficient strategy for managing and archiving the data in ways that support efficient work on the project. Second is the question of ownership of the data and rights of different users to have access to the data. Both issues are important, but it is also important that they are kept as separate as possible. Sometimes ownership issues side-track the technical discussions and it then becomes difficult to develop an effective data management strategy.

The SSC act as the biometric advisors for DFID natural resources projects. As part of our work for DFID we produce a series of "good practice" guides. These cover mainly: * Design of research studies * Data management (and archiving) * Analysis * Presentation

All these guides are available for reading online or downloading The most popular guides and best received tend to be those related to data management. Titles include: * Data Management Guidelines for Experimental Projects * Excel for Statistics: Tips and Warnings * Disciplined Use of Spreadsheets for Data Entry * The Role of a Database Package in Managing Research Data * Project Data Archiving - Lessons from a Case Study

Within our Centre, our main work is concerned with statistical support, but we spend an increasing time on projects that involve data management. This is because poor data management is often the main limitation in project teams' capability to exploit their research data fully. So in the past two years much of our work has been concerned with support on data management and archiving, our involvement in the ATROFI project is an example.

One common fault is that project teams sometimes underestimate the complexity of their data management tasks. They may manage data individually, partly to protect their ownership rights. There is then the assumption that the archival problem is small and can be left to the end of the project.

This is not normally a sensible strategy. It adds extra work at the end of the project in describing and sorting out which data needs to be permanently archived. This is work that does not benefit the project team and is therefore often incompletely done. We propose that new projects should concentrate on establishing a good data management strategy from the start. If data are considered generally and are well managed then the project should proceed more smoothly and the archiving work is considerably reduced.

Issues to be considered in devising a data management strategy include the following:

* Spreadsheets can be used if the structure of the data is simple, but have to be used with more discipline than is the practice by some scientists. Standard database software is often more appropriate and this can introduce issues of training for project team members. One practical difficulty is that most courses and the literature on database software is primarily concerned with business applications. The booklet on the role of a database package in managing research data gives further information. * Project teams need to consider whether data from each research activity is just to be analysed separately. If combined processing of information from different activities is important, then they should be managed in a single database as far as possible. Devising a common database structure for multiple studies may not be easy, but the rewards, in terms of ease of analysis and of archiving can be great.

CONCLUSIONS

In conclusion it is clear that for all projects constructive work on archiving is only possible when we recognise that there is a problem and that a good solution is difficult and time consuming (as demonstrated by this project). For new projects archiving should be obligatory. However, it is less important than good data management during the project and we need to recognise that this is a difficult task.

Back to Contents page

Inventory and Data Retrieval - the view from NRI Trevor Abell - Natural Resources Management Division, Natural Resources Institute (NRI) Andrew Larkin - Librarian, Natural Resources Institute

BACKGROUND

NRI came into existence in 1988, following the move to Chatham and the amalgamation of the Land Resources Development Centre (LRDC), the Tropical Forest Products Institute and the Centre for Overseas Pest Research. LRDC, located close to the Directorate of Overseas Surveys at Tolworth (from which it was initially formed), was involved with natural resource assessment including forest inventory and it is therefore the work of that organisation that is particularly relevant to this meeting when looking at historical forest inventory data.

During the period that LRDC operated, forest inventories were carried out in: -

Ethiopia, Ghana, Gambia, Nigeria, Tanzania, Sudan, Belize (British Honduras), Bangladesh, Jamaica, Fiji, Indonesia, New Hebrides, St Helena and Solomon Islands.

After the formation of NRI, inventories involving our staff have been carried out in:

Belize, Ghana, Guyana, Somalia, Kenya, Malawi and Indonesia.

Additional studies could also be mentioned in other regions where forest mapping and resource assessment has been carried out, though not necessarily including inventory.

During the LRDC era and for the early years of NRI, up to 1996, the organisation was a fully government organisation that was administratively part of the Overseas Development Administration (ODA); in fact, for a few years, we had the rather awkward name of Overseas Development Natural Resources Institute. Work was undertaken with funding from ODA with occasionally financial involvement of other multi-lateral organisations such as the World Bank and FAO (Food and Agriculture Organisation of the UN).

DATA HANDLING

The forest inventories undertaken could be either the product of specific forestry investigations completed as a short to medium term exercise or the component of a long-term multi-sectoral land resource study. In all types of investigation there was some degree of involvement of local organisations - normally representatives of the forest department in relation to the work of forest inventories. Fieldwork was usually undertaken as a combined exercise between the local forestry staff and the LRD/NRI foresters.

With respect to data analysis and report preparation, there was not a uniform approach, but generally with the longer-term regional studies arrangements would have been made for data to be analysed in country. For the shorter, more localised studies, data analysis and report writing was undertaken often within the UK.

In terms of handling and storage of inventory records this means that in some instances the full field records were brought back to HQ, whereas, in other cases, all the original field records were left in-country together with many of the summary sheets and computer records. Only the final reports and associated annexes might be returned to the UK. Fortunately, in several instances much of the work in terms of volume equation derivation and construction of stand tables was often undertaken with the assistance of OFI (Oxford Forestry Institute), so Oxford has become a repository for at least some of this inventory data.

THE LIBRARY AND RECORDS

Following the transfer of NRI from the civil service to the University of Greenwich (UoG), a decision had to be made on what should happen to the extensive library (thelargest single library of tropical natural resources in Europe) and all the other

records. Funding support for the library gradually ceased from DFID and the library is being incorporated into the Medway Campus library of the University. The former components of NRI brought their own collections of forestry material, although the LRDC had by far the largest amount. For the most part, the three collections are still housed separately at Medway, but there is an on-going programme to transfer material onto the UoG catalogue. At the moment, the interest is the post -1980 material and it is unlikely that the older archive material will ever be computerised and searchable on one single database

All the LRDC collection (i.e. reports, papers, journal articles and books) has been computerised and available on TRADIS (Tropical Research and Agricultural Information System), a searchable database mounted on the CAIRS system (Computerised Agricultural Information Retrieval System). This database is no longer networked but can be interrogated on a standalone server. This database was compiled only from material held in-house. Field records were never itemised and included on the database - unless they had been converted in some way into a recognised report annex and could be entered as such. There are around 5,000 documents on the TRADIS system with a forestry flavour. The TPI (Tropical Products Institute)/COPR (Centre for Overseas Pest Research) material was placed on a similar in-house database called TRAIS and has around4,000 items of a forestry nature. The older pre-1979 material can be accessed from traditional card catalogues and this goes back to the 1890s. A separate technical card index again covering the interests of TPI since its formation has around 25,000 references for "forestry" - the majority relate to individual tree species.

At present there is no specific policy for the forestry collection. The NRI collection is managed as a whole with no bias to particular areas. There is an obligation to maintain this collection for future reference and there is considered to be no risk of withdrawal or disposal of the material. TRADIS is being transferred into the UoG system, but there is still a considerable backlog in this work.

There was great pressure on storage space and initially many of the field records were despatched to a central university store with inevitably some disposal of that material considered to be of no further interest. Fortunately, it seems that much of the former LRDC material was saved and was finally properly archived and put in the care of the Public Records Office at Hayes, Middlesex. A total of 300 feet of shelf space was despatched to Hayes in 1999 -this includes all disciplines not just forestry. All correspondence files were considered to Government property and have been separately stored in the Public Records Office.

Reports and analyses have been produced from much of this forest data, but making further use of the raw field data in some other new way, although theoretically possible, in practice will give considerable problems unless the researcher can identify adequate notes to provide clear guidance on inventory design and any coding of records that has been used. All the inventories that have been undertaken since the transfer out of government ownership have been conducted as part of long-term studies and field records will be left in country.

In essence we have:

* A comprehensive library database system - but not a single one-stop search tool for all the collections.

* A continuing programme to merge all the older collections into the UoG system, but this is unlikely to be achieved for the oldest material - at least in the foreseeable future.

* Inventory records and field cards are no longer on site but have been transferred to Hayes. Presently, no overall database for the field records.

* Collections of field cards are known to be patchy and dependent on the level of partnership with in-country organisations at the time the inventories were being conducted.

THE FUTURE

Ideally for any future inventory work, it would be sensible to ensure that a distillation of the key facts are submitted to a web-based database - perhaps under the control of FAO. Data summarised in the form already being undertaken by ATROFI-UK would seem to be ideal: -

Location, forest area, forest type, main species, type of inventory, sampling percentage, parameters recorded, type of data storage and the volume regression equations calculated.

In addition, investigators should be asked to supply stand tables, species lists and overall volume estimations, specifying whether this is for commercial or total volume

FOREST INVENTORY REPORTS - NATURAL FOREST

BOTSWANA Forest Inventory and Management in the Baikea Forest of North-west Botswana. P. Henry, Project Report 44.

ETHIOPIA Southwest Ethiopia Forest Inventory Project : an inventory of Magada Forest. D. Chaffey, Project Report 28, 1978 revised 1980. Southwest Ethiopia Forest Inventory Project: an inventory of Munessa and Shashemane. D. Chaffey, Project Report 29, 1978 revised 1980. Southwest Ethiopia Forest Inventory Project: an inventory of Tiro Forest. D. Chaffey, Project Report 30, 1978 revised 1980.

KENYA Kenya's Indigenous Forests -Status, Management and Conservation. Editor Peter Wass, ODA - IUCN 1995, Over view of the KIFCOM project. Plus Separate forest inventory reports 1-15, Hugh Blackett 1994.

MALAWI A Report on the Inventory of Dzalanyama Forest Reserve. Unpublished report (T.

Abell), Forest Planning Unit, Lilongwe, 1995. An Inventory of Miombo Woodland in the Lower Shire Valley. T. Abell, 1993.

TANZANIA Tabora Rural Development Programme, Wood land Ecology Reconnaissance survey, R. Lawton Project Report 76, 1979.

SUDAN Report on the Forest Development Prospects in the Upper Kinyeti and Ngairigi Basins, Imatong Central FR. Jenkin, Howard, Thomas, Abell and Deane, Land Resource Study 28, 1977.

SOMALIA Charcoal in Somalia: A Woodfuel Inventory in the Bay Region of Somalia. Neil Bird and Gill Shepherd, Reference B009, NRI, 1989.

GAMBIA Inventory of the Mangroves above the Proposed Gambia River Barrage at Yelitenda, Gambia. M. Johnson, Project Report 54, 1978.

NIGERIA Land Resources of Central Nigeria. W. Howard, Land Resource Report 9, 1976.

BRITISH HONDURAS/BELIZE Inventory of the Coastal Plain of British Honduras. M. S. Johnson and D. Chaffey, Land Resource Study 15, 1974. Inventory of the Chiquibul Forest, BH. Johnson and Chaffey, Land Resource Study 14, 1974. Inventory of the Mountain Pine Ridge, BH. Johnson and Chaffey, 1972. A forest Inventory of Part of the Mountain Pine Ridge, Belize. LAND RESOURCE STUDY 13, 1973. Belize Mountain Pine ridge Forestry Project. J. Sandom, Project Report 209, 1987. Sustaining the Yield: Improved Timber Harvesting Practices in Belize 1992-98. Neil Bird, 1998.

JAMAICA An Inventory of the Carib Pine Forest in Central and Eastern Jamaica. M Johnson, D. Alder, M. Jefferson, Project Report 81, 1981.

BANGLADESH A Forest Inventory of the Sunderbans , Bangladesh (3 volumes). D. Chaffey, D. Miller, C. Myers, J. Sandom, Project Report 140, 1985.

HAITI Method of Inventory - Pine Forests in Haiti. M. Berry and K. Musgrave, Technical Report 2, 1977.

GRENADA Forest Inventory in Grenada. M. Johnson, Project Report 169, 1985.

FIJI Fiji Forest Inventory Vol. 1- Environment and Forest Types, Vol. 2 -Catchment Groups of VitiLevu and Kanavu Vol. 3 - Catchment Group of Vanua Levua. M. Berry and W. Howard, Land Resource Study 12, 1973.

NEW HEBRIDES New Hebrides Codominium, Erromango Forest Inventory Project. M. Johnson, 1968.

INDONESIA The Forest Resource: Site Investigations along the Trans-Sumatra Highway. Project Record 40, T M Abell, 1979.

SOLOMON ISLANDS Field Sampling of Camposperm Forest Santa Isabel, BSIP. T. Rees LRD Miscellaneous Report, 1964.

Back to Contents page

Archiving Census and Woodland Inventory Data Graham Bull - Woodland Surveys, Forestry Commission

BACKGROUND

The Forestry Commission (FC) Conservancies are responsible for administering grants to Private Woodland Owners in England, Wales and Scotland. Forestry in Northern Ireland is covered by DANI (Department of Agriculture Northern Ireland).

DATA HOLDINGS

Information retained by the Forestry Commission of relevance to this workshop is as follows

Woodland Grant Scheme

The Woodland Grant Scheme (WGS) database is continually being updated as new schemes are applied for. Old data is not deleted and regular backups are made but there is no intention of archiving data on the WGS. The database can be used to make queries about active grants for a particular date.

Constraint mapping

Coloured paper maps at 1:25000 & 1:50000 have been retained showing the earlier grant schemes and constraints e.g. SSSI's, AONB's, NP's, TPO's etc. Digital data snapshots have been saved on Optical Disk or DAT (Digital Audio Tape) in a fire proof safe and off site.

Subcompartment database

Forest Enterprise (FE) manages the State owned forests, as from April 2000 will take six monthly snapshots of subcompartments and this data is stored on the subcompartment database (SCDB). Woodland Surveys have archived 1995, 1996, 1998, and current 1999 subcompartment data on SCDB.

Stock maps

The FE started saving digital versions of stock maps in 1997 but very few old paper editions have been kept.

WOODLAND SURVEYS HISTORY

The Forestry Commission (FC) was formed after World War 1 in September 1919. Since then there have been five main censuses or woodland surveys where purpose has been to provide information about woodlands in the UK at the regional and national level. These surveys are summarised in Table 1 below.

Table 1. Great Britain Woodland Surveys

Year

Woodlands covered

Minimum Woodland size

Method of Survey

1924

FC and Other

0.8 ha

Questionnaire

19301938

FC and Other

2.0 ha

Sampling

1947

FC and Other

2.0 ha

Complete plus small woods

0.4 ha

Sampling

1965

Other

0.4 ha

Sampling

1980

Other (except dedicated and approved)

0.25 ha

Sampling plus non woodland trees

1995-

2000

FC and Other

2.0 ha

Main survey - map all woodland >2 ha, 1% sample of woodland area.

0.1 ha

Small Woods and Trees - sampling 1% of land area

 

1924 Census

Copies of report are available in the research library at Alice Holt. This includes a copy of the questionnaire used within the report but no map data is available.

1930 and 1938 Census

The 1930 census was a survey based on the 1924 questionnaire and copies of the report are available but there are very few documented details of the data that was used. The 1937 survey was never fully completed due to the outbreak of World War 2. A map is provided of areas (in Scotland) that were surveyed. The survey data is available for counties and an incomplete set of maps is also available.

1947-1949 and 1951 Census

The 1947 census was a complete survey of all woods over 5 acres including small woods and hedgerows surveyed in 1951. Reports are available for England Wales and Scotland respectively. Photographic copies of the 1:10560 maps and main published reports have been retained as well as an incomplete set of record maps. The latter provide details of woodland type surveyed. Including: Coniferous high forest, Mixed high forest, Broadleaved high forest, Coppice with standards, Coppice, Scrub, Devastated, Felled, Lost, Thorn colonisation.

The 1951 hedgerow survey was an assessment of Hedgerow volume carried out on sample woods between 1 and 5 acres. 441 maps were selected and a part of each sheet was assessed for small woods. All data and maps related to this survey are stored in PRO.

1965-1967 Census

Only one national report was published and copies are available in the FC. Unpublished county summaries are stored off site. FC management boundaries have gone through various changes over the years, and variations are contained within the annual reports, note that the English conservancies are due to change again on 1 April 2000.

1980 Census

The 1980 census produced county, country and national reports which are all available. Reports were also produced for the conservancies of the day i.e. North, South, East and West Scotland etc. All reports have been saved to microfiche and microfilm. Field maps at 1:10560 and 1:10000 have been distributed throughout the FC.

The forest research library at Alice Holt has a collection of ground photographs taken for this survey. Aerial Photo by county have been donated to the Royal Commission on the Historical Monuments (RCHM) of Scotland, RCHM of England and the Air Photo Unit Welsh Office, Cardiff. The FC combined information from aerial photographs and Ordinance Survey green plate (purchased on film) to mark up 1:50000 maps with FC dedicated and approved woodland. These were then scanned by Laser Scan (a Cambridge based company) using Laser Track (also used for banknotes) and the data was saved on 9" magnetic tapes.

Current National Inventory of Woodland and Trees

A pilot inventory was started in the Grampian Region Several changes were made to the original design, i.e. moved to sampling 100 Km tiles and to surveying all

woodland regardless of ownership rather than splitting woodland types between FC and Other. The main woodland survey covers all areas of woodland greater than or equal to 2ha. In addition a survey of small woods and trees is included covering small woods of 0.10 - 1.99 ha, groups of trees, linear features and single trees.

The main woodland survey made use of 1:25000 photography (LCS88 in Scotland and in England & Wales the FC either purchased existing cover or commissioned contracts to obtain photography). This photography is being distributed to conservancies for further use once field data has been validated.

The Grampian report and data is now lodged with the PRO via ULCC with a 30-year restriction on public access to the data. An Oracle database of sample woods and 1 ha squares is to be archived. In addition digital map data (Vector and Raster) are all archived off site on CD-ROM and DAT tape.

County reports in England & Wales, regional reports in Scotland, country reports and Great Britain reports are to be produced. Reports will be available on the FC web site. MS Office versions of reports are to be archived as well as data capture field manuals, field notes, field maps, statistical programmes, and VAX/VMS text files.

DATA USE

The following is an example of how the data is used to monitor and compare change.

The 1980 census revealed that within Wales there were 241,000 ha of woodland (>0.25 ha) representing an 11.6% land cover. The 1998 Forestry Facts and Figures report showed that there was 247,000 ha of woodland (11.9% cover). The 1998 inventory of woodland indicated that there was 271,000 ha of woodland over 2.0 ha (13.1% cover) and 281,000 ha of woodland over 0.25 ha (13.5% cover). Note: The latter figure is an interim estimate, Small Woods survey still being analysed.

DISSEMINATION

The Woodland Surveys site (www.forestry.gov.uk ) is currently under construction within the Forest Research site. It will include links to Local Authorities and other related sites. The FC has a policy to provide information on woodlands, to monitor change and make comparisons with the earlier records.

Back to Contents page

Maintaining Forest Data for Future Use - A Commercial Perspective Paul Smyth - Huntings Technical Services

BACKGROUND

Since the 1950s, Hunting Technical Services Limited (HTS) has provided technical assistance in rural development initiatives. In this role, over 1300 studies and projects have been completed in more than 130 countries. This work has included many

dedicated forestry sector assignments. All HTS projects generate data of one kind or another. These data are most commonly presented in text and tabular form as technical reports and their appendices. Many studies also involve the making of maps, generally produced through air-photo or satellite image interpretation, supported by ground survey. In recent years, map production has been achieved through digital cartography.

DATA COLLECTION AND DATA ARCHIVAL

Archiving of all this data is an issue that undergoes frequent consideration by HTS. All project reports are kept indefinitely, either in the Company library, or at a storage facility off-site. Working project documentation is retained in our stores for a period of five years, after which there is an active microfiche programme. By and large, copies of all maps either used or produced by the company are kept for internal use and for reference. Similarly, photographic negatives that allow the reproduction of satellite image mapsheets have been archived. The storage of digital satellite data and digital maps is more problematic. Air-conditioned storage conditions are required, and digital media and data formats can become obsolete. Additionally, the integrity of data held on these media cannot be guaranteed, as there is no procedure for regularly checking readability.

Occasionally a data archive constitutes a deliverable project item, with the intention being to hold the data archive in the recipient country. In such cases, there has been relatively little additional effort involved in creating a second archive for storage at HTS head office.

While we are aware that the forest data accumulated by HTS may have a value that extends well beyond the time frame of the particular study, it is becoming increasingly difficult for us to ensure the long-term maintenance of an archive of our project data.

At HTS we are involved in fairly broad-based natural resources consultancy. In many projects, this involves map making, often using remote sensing data and GIS (Geographic Information System) analysis. There is a general rule of thumb that all data collected on a client's behalf by us remains the property of that client.

The forestry studies I have been involved in myself have been national reconnaissance level mapping projects. In such projects, data have been collected at known-location Temporary Sample Plots (TSPs) in order to characterise the vegetation classes that we have been mapping. Vegetation data is collected at these TSPs and this includes height, density (canopy cover), trees per plot, stems per plot, stem diameter and bole length, if appropriate.

Data collected in the TSPs were used to characterise the vegetation classes that we were mapping, and to assist in the interpretation of areas that we were not able to visit during the field data collection phase of the project. Sample TSP data were fair copied and included in an 'Interpretation Manual' that was a project deliverable (Yemen). I do not know what has happened to the completed field sheets for this project. As the Interpretation Manual was compiled in the UK, I suspect that these have been filed and will be stored at an archival facility off-site.

For Tanzania, all raw field notes were filed and deposited with the Client (the Institute of Resources Assessment, University of Dar es Salaam, Tanzania).

I am not sure, but I would think that field data and working (non deliverable) reports are usually kept; these would be boxed up at project end and sent to the head office where they would eventually end up on the shelves of archive agencies. The (remote sensing and GIS) data sets that we are involved with tend to be larger than those you have been talking about and there is a severe competition for appropriate storage space. We are currently in the process of deciding which digital data to dispose of, given that there is a transfer of use of our climate controlled room, from image-processing lab and tape store to a home for our networking routing equipment.

Project reports are kept indefinitely either in our office or off site. Working documents are retained for five years after which there is an active micro-fiching system. Copies of all the maps made or used are kept in the office for internal use and reference. In the past have we kept them as aperture cards and as film positives.

Digital data archiving is much more of a problem as it tends to require expensive conditions such as air-conditioning. In addition, regular maintenance is needed such as winding on and rewinding Computer Compatible Tapes (CCTs). In the past, we used old imaging systems, data from which are no longer readable without effort on our part to translate the format to something more modern.

CCT data of value has been transferred to new media; there is a certain amount of effort involved in doing this and there is not necessarily any commercial reward for this. We also cannot guarantee the readability of the data. Part of the problem is that propriety formats of the data may not be readable by current systems. There is a cost associated with getting such data into new formats and onto modern media.

For example, some GIS datasets or map composition files contains pointers to other files in other sub directories, and when files are moved the pointers do not work any more. Care must be taken in the archiving of these kinds of files.

Some of the projects have specific instructions on archiving in the project document. For example, in mapping the woodlands of Tanzania, we were required to provide the maps in a medium that they could read and we had to keep a duplicate copy. But such specification is often not the case and there is always the risk that data is may be inadvertently lost or destroyed in country. So our working practices apply to material that is filed at our head office but not to material or data that is left in country. If the data does not come back to head office then the only possible access is via the reports that have been written.

There have been several conscientious individuals in the company who have in the past taken all the old data and kept it in their houses if they feel it is threatened by restructuring.

However, as shelf space is limiting, our archiving policy, and health and safety regulations may limit the physical amount of data that we are able to store.

Back to Contents page

Volume/Biomass: Geo-referenced Forest Volume Data for Tropical Countries Alessandro Baccini - Independent Consultant for FAO FRA2000

The FAO FRA 2000 project (Forest Resource Assessment) aims to collect information on forest cover worldwide in order to review the state of the world's forests and to track changes. The project has been active for three years and is now beginning to produce results.

The FAO have a database that is similar to ATROFI but is more basic as it only includes data that is useful for the FRA2000 project, that is spatial volume data. The database includes two tables, one for general information and another providing further details that are added if the data is considered to be of use. When selecting data that might be of use it is necessary for the data to be accompanied by a map and this we found became one of the main constraints on data set selection. If the data cannot be geo-referenced then it cannot be used to determine spatial volume. A second criterion for selection was that all species should have been inventoried as we are interested in total volume. Likewise, it was important that the minimum diameter taken in the study was less than 40cm. Finally we had to be certain that the data was reliable.

This database exists in dbf format. The database was converted to Access in order to produce a report of the work that has been done and soon the data will be moved to an oracle format for maintenance purposes. The meta data is not stored separately from the data itself but the data is linked to a GIS. This makes it possible to cross link data with other information such as climatic data. The data is fed into a model to produce spatial forest volume information. The team has also developed a land use index from which one will be able to derive national level statistics.

This briefly explains the data needs of FRA 2000 and demonstrates an important use for the information collected on ATROFI. ATROFI, however, does not explicitly include spatial information and this would be of great value as the demand for data that can be used in spatial modelling is steadily increasing.

The method of data storage and analysis used by FRA 2000 has the advantage that both new and old data can be used and are of value. In the recent past there has been a tendency to collect less forestry inventory data and instead to concentrate more on land use mapping. This means that the old inventory data becomes more important and valuable. The older data provides a better indication of the potential carrying capacity or yield as more recent inventory data comes from forests that have already been logged.

The report of this work is available from the FAO. In addition a copy of the forestry paper 134 can be obtained from FAO.

The best data is plot level data but we only had this for a few countries. Considering the scale of the work, the fact that it was global and we were using 1km square

resolution we considered almost everything we could collate. We did not include forest inventory summaries for areas bigger than 3 million ha. The scale of summary often depends on the size of the country.

Back to Contents page

Managing environmental data Geoffrey Collett - Environmental Information Centre, Monks Wood, Natural Environment Research Council (NERC)

The Centre for Ecology and Hydrology (CEH) has recently been formed from the old Institutes of Terrestrial and Freshwater Ecology, The Institute of Hydrology and the Institute of Virology and Environmental Microbiology. The Environmental Information Centre (EIC), at CEH Monks Wood, is the Designated Data Centre with responsibility for managing those NERC Terrestrial and freshwater datasets not catered for by the National Water Archive. EIC is also responsible for developing data strategy policies and procedures.

EIC has a variety of data sources from large thematic programmes, through science programmes and NERC grants to voluntary recorder schemes. Data management within CEH also occurs within data centres that have specific responsibilities such as the Environmental Change Network and the Biological Records Centre.

We produce data management plans for the thematic programmes. These cover Intellectual Property Rights (IPR), copyright, Principal Investigator responsibilities, metadata, quality assurance, recommended data formats and media, stewardship etc. Each project has to prepare a data management plan. It has been suggested that the last instalment of grants should not be paid if the data management plan has not been prepared! It is intended that these data management policies will be implemented across CEH.

EIC's role includes ensuring adequate storage, stewardship, validation and dissemination of data. We also supply advice on quality assurance. When data is passed on for storage we require that data is deposited with adequate documentation.

We are also seeking out datasets for recovery and NERC has provided funding until recently for data archaeology. This activity often has an unexpected cost in the need to enlist senior staff to investigate the data from an expert point of view.

Discovery Metadata catalogues are important for data dissemination giving basic information to enquirers about the who, what, when and where of the dataset. EIC has a metadata search engine on the web that looks at an underlying Oracle structure of our database which is intended to contain records on all the CEH data sets.

In handling data requests we keep in mind that the Environmental Information Act states that environmental data has to be publicly available (but not that it has to be freely available). The EIC offers a data sales and licensing service for CEH data products. Our policy is to make data that NERC owns, freely available to bona fide

academic researchers. We also provide guidance on data management to our various research arms.

It is our brief to encourage public interest and awareness of environmental data. Recently we went live on the web with a phenology data set that should catch public interest. It allows user to update a database of phenological events such as the first flowering of snowdrops. Other examples of widely used products are the Countryside Information System, the Countryside 2000 dataset and the Landcover map.

Back to Contents page

Discussion

The ATROFI database

IS ATROFI USEFUL AND COMPLETE ENOUGH?

The ATROFI database, in its present form, was considered useful by all present. The following suggestions were considered appropriate for future development of the database:

* A global map could be provided as part of the front-end that would permit immediate assessment of the geographical range of datasets included. * It was suggested that it would be useful to include information on the existence of stand tables, species lists or volume estimates that the dataset had been used to produce. It was agreed that the references to literature could provide this information as adding this to the database would be time consuming. It is intended that grey literature should be scanned into the database. * It would be useful to have a field in the database stating what the data has been used for in the past as well as what it was collected for. The latter already exists but explicit objectives are not included. * Although time consuming it would be useful to link the data with GIS systems by providing geo references for each data set. * The database should include records of data sets that we are aware exist but that we have not yet been able to describe in full. Users could be asked to provide any additional information they might have on these data sets. This is particularly important for data sets we are worried about losing. * It may be useful to add a field to the database that provides information on the amount of assessment that is still required to understand the data set. * In addition a field could be added describing what the long-term risk is to the storage of the data and how valuable it is. * It is important to collect feedback from the users on the usefulness of the database. A short questionnaire making such enquiries should be included on the web page. It is important to publicise the database a lot to stimulate interest.

FURTHER WORK ON ARCHIVAL

The project did not complete the collection of information on data sets held within

the UK. Most of the relevant individuals have been contacted but work is still required on the data at Oxford and the NRI data held in Hayes. The latter has not yet been seen. In addition there may still be relevant data stored in geography and/or zoology departments that we do not know about. Although many of these have been contacted, when working across disciplines it sometimes takes several enquiries to different people within a department before data sets are discovered that fit our criteria. In some cases there may be only one person holding this kind of data in a department. Other sources of data that have not yet been looked at include management plans and MSc and doctoral theses.

It was agreed that we should concentrate first on data that is at risk of being destroyed or getting lost. In addition, with regard to data stored in Universities or other institutions, the people who know about the data and who are key informants on the data are fast disappearing. Theses and other publications stored in libraries are relatively safe and the data should already be adequately described, filing cabinets full of data, however, are less secure and require input from more informants to understand the data. It was suggested that institutions should be asked to keep data for the next five years to ensure that no further valuable data is lost before it can be described and assessed for archival. Alternatively, it was suggested that there should be an intermediary holding place where data can await assessment before a decision is made about its value and archival.

There is a need to continue with the information collection. The amount of work required to collate information about a data set can be extensive and depends significantly on whether or not key informants are available. It was suggested that the time required to describe a data set (OFI and NRI) should be tested using one or two data sets. In some cases significant work may be required to 'rescue' data sets and this needs also to be assessed. In addition, the process can be assisted, if the web page is publicised and a questionnaire is provided requesting further information about data sets already in the database. Once this has been completed the next logical step is to start including information on tropical forest inventory data held in Europe.

VALUE OF DATA

As well as describing this data the problem of finding an appropriate place to store it is also of concern. In order to make a case for safe storage, we need to be able to determine and describe how important these data are. The value of this kind of data needs to be assessed and some data sets will be more valuable than others. Some criteria need to be developed to value data sets.

In general the data is considered valuable because:

* Collecting forest inventory data is expensive and the cost of storing it pales in comparison. In most cases the data cannot be re-created. * Some data loses its value as it gets older but the value of forest data increases with time, reaching a peak at about 100 years (depending on the rotation or the life expectancy of the trees). * The best data is that which provides quantitative measurements at the plot level. Plot summaries and forest summaries are, however, also of value. * The number of re-measurements and hence the time span increases the value of the

data. * If plot locations are adequately reported then re-measurement is possible. Thus the potential value of the data is increased. * Much of the older data provides the only information left about the natural 'climax' vegetation of an area. * Older data may also be the only source of data on potential volume for a site as, since the 1940s, most of these areas have been logged at least once. Older inventories tend to report higher volumes. * The potential uses of this data need to be listed. * Ground truthing for remote sensing data. * Modelling environmental and climate change.

STRATEGY FOR MAINTENANCE OF DATABASE

The database needs to be maintained after the end of the project. The database should be maintained at an institution with a small continuing grant provided. The question is where should the database be held and who should be approached to fund it?

In considering the location for the database the need to survive institutional changes should be considered. The possibility that FRP might be able to provide funding for further maintenance was considered. The inclusion of more data sets in the database would increase its value, enabling it to attract funding for future maintenance. The cost of maintaining the database needs to be investigated; note that the cost of creating the data far exceeds the cost of preserving it.

Funding could be sought from the EU (European Union) for expansion of the database to cover data sets stored in Europe.

The database should be put onto a CD-ROM and needs to be publicised widely, overseas as well as in the UK. This publicity should provoke feedback and queries about data sets and increase potential interest from funding agencies.

CONTACTING DATA OWNERS - ACCESS TO DATASETS

Where data ownership is unclear the project had originally intended to approach source countries to ask about permission to use the data. However, it was decided that instead source countries would be provided with a copy of ATROFI, informing them of the existence of particular data. It would then be up to them to request copies of the data. Potential users of data would be required to contact the source countries for permission to use the data.

This system was chosen because firstly it was felt that it would be difficult for the project to locate officials senior enough to grant blanket permission for data use. Secondly, officials are unlikely to give blanket permission but might be able to give permission on a case by case basis, i.e. for certain uses. Thirdly, and consequently, if approached for blanket permission for data use, officials are more likely to place a total ban on use of the data.

We are also aware that some data sets held in the UK are sensitive and are sometimes being held unofficially. These need to remain confidential at present but should be

kept securely as in 20 or so years they will still be of high value, they may no longer be sensitive and they may be the only copy left.

If an originator country decides that the data should not be stored in the UK or in an international organisation, negotiations on ownership will have to be opened.

Back to Contents page

Policy

THE EXISTING SITUATION

Data collected by the Land Resources Development Centre, now amalgamated into NRI is currently awaiting assessment for archival value by the PRO. It has been placed in a set of boxes stored in Hayes. This data is not secure and is not yet documented or described. The project needs to make sure that no decision will be made on this data until it has been assessed by the project. DFID is the Government Department that is currently placed to make a decision about this data. If the PRO is made aware that this data is important but DFID decide that the data does not merit archival, the PRO will present it to other organisations for assessment. The PRO needs to be informed that this data is potentially of value. If the data was selected for archival then, although the material contains digital data, it is all on paper and therefore would not be transferred to diskette and stored on NDAD by the PRO but instead would be stored as paper documents.

The Centre for Ecology and Hydrology store data from environmental research in the UK. They are involved in some overseas projects but this is not necessarily stored on the central database but rather on the databases at the individual research stations. There is no formal attempt to co-ordinate with European organisations. Open access to data is considered important although older scientists tend to be more concerned with ownership. Those data sets that are expensive to maintain are to be reviewed regularly. For much data it is cheaper to store it than to review it.

Through project reporting requirements the CEH ensure that data is submitted for storage. At the start of a project one is required to fill in a form that includes issues such as accounting, data management and health and safety. For example, there is a box on this form that one ticks to say that a data management plan has been completed. So the approach to ensuring compliance is similar to that for Health & Safety. Data storage, as long as it is something that occurs through the lifetime of a project, is not very expensive. Adding value, on the contrary can be more expensive, for example, improving the ease of access to databases or providing publicity.

ARCHIVAL PROCESSES

Information stored on electronic media requires a technology for its retrieval and technologies tend to become obsolete. Therefore, the PRO also stores a paper copy of the information if available, preferably on microfiche.

Initially NDAD devoted it efforts to storing valuable digital information that was at risk of being lost. They then included information that might be of immediate interest to the public and only opened to the public two years ago once a critical mass of such information had been stored.

NDAD store digital data that is available on electronic media. Paper records or reports may be scanned in but only if they relate to digital data stored on electronic media.

It is very important to describe data well so that it can be used in the future. The PRO do not undertake to do this kind of work and would not, for example, take on the job of describing the NRI data stored at Hayes, this would be considered DFID's job. Neither would the PRO convert digital data on paper to electronic format. Yet the PRO are keen to help people to get funding for this kind of work.

On the other hand, much of the work of NDAD has been on providing provenance information for data sets. They are able to do this as they have a less data to deal with than the PRO. In addition, the data they receive has not come through an archival system in a department but rather from individuals within departments who have collected or have been working on the data. They encourage the older people to look through the old data as they are more likely to understand documentation of their own era than a present day clerk. Combining an older informant with someone who has technical qualifications and an understanding of current needs is a good way to do this work.

NDAD have also worked on data sets, i.e. cleaning them etc. and making them usable. The ECN, on the other hand will store the data and some information about it but will then require users to invest in the full recovery of the data. Most users are willing to do this. This does mean, however, that not all data sets are equally accessible and some of the less accessible data may never get used.

FACILITIES THAT CAN BE OFFERED TO INDIVIDUALS

The facilities for data storage that individuals should be able to utilise were discussed. Here the individuals referred to include retiring foresters, retiring forest researchers or consultants who have completed a certain piece of work. It was noted that the problem is more acute for the past decade than for earlier periods as information is now stored on diskettes and consultants are less tied to institutions that might store their data. There is a need for a system where people can deposit their data. There are no existing facilities of this type at present and data is lost or left in a box somewhere without adequate description.

THE IMPLICATIONS OF CONSULTANCY COMPANY RESPONSIBILITY.

Consultancy companies such as Huntings Technical Services have large collections of data, some of which has been collected on DFID funded projects. Huntings appear to act relatively responsibly with regard to data management and storage yet certain issues of concern were identified with this system.

There are two main implications of a system where the consultancy companies who

have collected the data are made responsible for the storage and maintenance of data. These are firstly, secure storage of the data and secondly access to and ownership of the data.

Companies are not the most secure depositories for data as, although they may manage data responsibly they tend not to build in measures for data storage if the company closes. And companies do close down. Archival companies and government departments now exist that can store and retrieve data permanently in a secure manner. Both are able to assure confidentiality of data stored if required.

The ownership and access rights to the data are not entirely clear. It seems that in most DFID projects the data is owned by the company but can be used or distributed by DFID upon request. The company is required to look after that data and ensure that it is available if needed. In some cases companies could claim a certain degree of IPR as they have added value to the data collected. This is not a common problem as DFID contracts tend not to limit ownership to consultancy companies and companies working in development are aware that a more open co-operative mode of operation is needed to achieve good results. Of greater concern, however, is limited public knowledge of the existence of data. Companies who have collected data, catalogue and store that data and know about it. This puts them at an advantage when bidding for projects that might build on or require data collected in the last few decades. Thus, one could argue that by opting for a system where companies are made responsible for data storage could result in a situation where certain companies have unfair advantage over others in a bidding process that is supposed to be fair.

Companies are required to cover the cost of data storage under their overheads. As the stock of data stored by a company increases the cost of maintaining it increases. If these companies do not gain a competitive advantage by keeping this data they are unfairly disadvantaged as they have to either bear or charge higher overheads on project bids. Clearly this needs to be carefully considered in any DFID information management or IPR policy.

THE IMPLICATIONS OF RETURNING DATA TO COUNTRY OF ORIGIN

The implications of relinquishing all responsibility for data storage and instead destroying data or returning data to the country of origin were discussed. This, in some senses, would comply with DFID's present policy on data storage for bilateral projects. Where their interest does not extend beyond completed reports. But: * The data is considered too valuable to destroy. * Returning data to countries of origin would be very expensive. It was agreed that this could be undertaken if the country of origin would pay for the data compilation and transport costs. * Those present felt that data should be stored safely in the UK. Firstly, because much of the data is of value to the country of origin, to the UK and to the world as a whole. Secondly, many of the countries of origin do not have the facilities or the capacity to store this data safely. And thirdly much of the data was collected using UK government funds and it can be argued that the government has an obligation to its taxpayers to store such valuable data for future use. The government is and should be concerned that the investment it has made is good and that data is available for developing countries to use.

In some cases countries may not wish the UK to hold a copy of their data and in this case data should be either returned or retained with assurance that it will remain confidential. This needs to be negotiated with the countries in question.

IS ARCHIVAL OF THIS DATA AN INTERNATIONAL RESPONSIBILITY?

As much of this data is of international importance is there an international responsibility for its archival? This is a possibility and organisations that might be approached in this regard include: CIFOR (Centre for International Forestry Research), FAO and ITTO (International Tropical Timber Organisation). Perhaps a facility could be provided for countries to request safe archival of their data? There may be precedents for such and arrangement in other disciplines. In meteorology there is no central data storage unit but there are international groupings for data storage and there is co-ordination with regard to data use and protocols.

FAO do not consider themselves responsible for data storage but as they are now responsible for the FRA2000 project they may be interested in storing and maintaining data that is of value for this purpose.

ITTO is a potential source of funding as they are interested in using this kind of data.

CIFOR has a policy that all information should be publicly available and therefore it may not be possible for confidential data to be stored with them.

UNEP (United Nations Environment Council) is planning to undertake a biomass project, they might also be interested in setting up a data storage system or in assisting with the funding for such a facility.

The IRRI (International Rice Research Institute) encourages researchers to deposit data with them, their system of data management should be investigated.

ICRAF are advanced in this area and are presently involving themselves in a data management exercise. Reading University is working with them to develop their existing data management software into a more widely usable tool.

It was concluded that the possibility of an international organisation holding responsibility for ATROFI should be carefully considered.

THE FEASIBILITY OF HANDING THE DATA TO THE PRO FOR ARCHIVAL

The feasibility of storing this data with the PRO or NDAD was discussed.

NDAD would only be able to take data that had been collected through a Government Department. Therefore a portion of the data we are concerned about would not be eligible. Other data could be stored on NDAD if funding were provided for the initial 'provenancing' and digitising. The cost of storage after that would be low.

The ECN would be able to store the data described on ATROFI but funding would have to be found to describe and transfer the data to electronic format.

It was noted that it is also important to consider the fact that data storage is permanent once the PRO has been requested to undertake it. It would not be possible to get rid of the data at a later date and therefore this is an option that is only appropriate for very valuable information. As the value of forest data tends to increase with time, this was not felt to be a serious drawback.

Government Departments can relinquish their ownership of data. A drawback of insisting that data should belong to a government department and therefore that it is eligible for consideration for storage via the PRO also gives government departments the right to destroy data if they decide it is of low value. To circumvent this, petitions can be made to government departments or to the PRO expressing the concern that certain data is of value.

ARCHIVAL POLICIES IN FUTURE

There is a need to develop better policies in the future to store and archive forest inventory data collected overseas by UK organisations and funded by the UK or by other donors. There is a need to ensure overseas capacity to manage and store such data. It is clear that many UK institutions that collect this kind of data have no data archive policy and hence much of the data is being lost.

A Government Department can relinquish the copyright for information that it collects and this is what DFID do for data collected on bilateral projects. DFID retain the copyright for data collected on research projects. At the end of a bilateral project a project assessment is undertaken but no assessment is made of the data collected on the project or of data management. It is necessary to find out if DFID's project completion report requirements say anything about this.

All Government Departments are now (or will soon be) required to have a functional appraisal policy, this means that they clarify their functions and on this basis they decide what sort of material should be archived.

* It would be advisable for new projects to include in their negotiations and final agreement something about allowing access to the data produced in the project. This would clarify IPR issues. In Bilateral projects, because fresh data can be politically sensitive, confidentiality periods could be specified. So that data could be made available for wider use 5-20 years after it was collected.

* Projects should also be required to undertake data management and, in the case of DFID projects, responsibility for the storage and management of data should be negotiated with partner governments. This should be incorporated into the reporting requirements.

* Those present at the meeting felt that DFID should take responsibility for keeping copies of raw field data collected on their projects.

* In the development of its functional appraisal policy DFID should take this into account. Scientists and the public should be consulted in the development of the functional appraisal policy.

* In particular, DFID should start assessing data on electronic format and submitting this to NDAD for storage where appropriate.

The DFID representative asked the project if they might be able to submit a list of criteria for the valuation and appraisal of digital data for archival. The project was also requested to provide advice on policy in general.

To start with we could build on our existing criteria for selecting data for inclusion in ATROFI.

It was suggested that all raw forestry data collected in the field should be kept, as this would not represent an excessive amount of data. If other sectors were included in this policy then the quantity of data might be unmanageable.

It was suggested that DFID projects should be required to undertake an annual appraisal of existing project data, describing data sets and providing valuation of the data and an assessment of their archival value. DFID should be requested to put this into practice.

It was suggested that data management capacity should also become a criterion for judging project bids.

Back to Contents page

Conclusions

ATROFI is a useful meta database. The following improvements could be made to it:* A global map could be provided as part of the front-end. * Related publications and grey literature could be scanned into the database. * A field could be added stating what the data has been used for in the past as well as what it was collected for. * Links could be made with GIS systems by providing geo-references for each data set. * A record of data sets that are known but not yet fully described could be included. * A field could be added to describe the amount of assessment that is still required to understand and used the data set. * Fields could be added describing what the long-term risk is to the storage of the data and how valuable it is. * It is important to collect feedback from the users on the usefulness of the database. A short questionnaire making such enquiries should be included on the web page.

Further work is required to collect information on data sets in the UK. In particular the NRI and OFI data holdings still need to be fully described. In addition valuable data is available in MSc and PhD theses on tropical forestry topics held within University libraries (in particular Oxford, Bangor, Edinburgh and Aberdeen). Although this data is relatively secure, summary descriptions of it (including an abstract of the dissertation) in the ATROFI database would render it more accessible.

The ATROFI database should be maintained at an appropriate institution (within the UK for the time being). A small grant needs to be obtained to cover the costs of maintaining the database.

The database, currently on the web, should be widely publicised to attract further funding, more studies for inclusion and ideas for expansion.

The database could be expanded to include information on European holdings of tropical forest data.

Many data sets described on the database are not usable in their current form but require significant cleaning and sorting work. It was decided that, in general, this should be the responsibility of potential users. Funding should be sought to undertake this work for particularly valuable data sets.

Many data sets are owned by government departments overseas. It is the responsibility of the potential user to seek permission to use data.

Once the UK entries are complete copies of the database should be sent to forestry departments in countries where the data originated. Requests for repatriation will be considered on a case by case basis and copies of data sets will be provided on receipt of funds to cover the costs of creating and send such copies.

Many data sets, although described in the database, are not secure. At present there is no UK strategy for the assessment and archival of tropical forest inventory data and some data is still at high risk of being lost. Such a strategy needs to be developed covering data collected by individual consultants, private companies, universities and government departments.

At present DFID projects require consultancy companies to take the responsibility for the maintenance and storage of project data. This raises issues that need to be resolved including copyright ownership, access to data, knowledge of the existence of data, long term security of data, possible use of data to generate unfair competitive advantage and hidden costs. DFID contracts need to specify more clearly the roles and responsibilities relating to data ownership, archival and access.

The British Government should take responsibility for the management and safe storage of forest inventory data collected on all overseas projects that it funds. Interests that the UK government is responsible to, include: tax payers, overseas governments whom it is assisting and the international community as a whole.

DFID needs to secure advice on an appropriate data management policy for both research and bilateral forestry projects including a list of potential criteria for assessing and valuing data. A good starting point for the latter are the criteria used for data inclusion in ATROFI.

It is possible that responsibility for the management and storage of third party data more properly lies with an international organisation. This needs to be investigated. FAO, CIFOR, ICRAF and ITTO should be contacted in this regard.

Back to Contents page

ANNEXES

ANNEX 1 AGENDA THURSDAY 30 MARCH

10.30 Registration and coffee

11.00 - 12.30 Session 1

* Presentation of the project Howard Wright and Jenny Wong * Demonstration of the project database ATROFI-UK Alessandro Leidi and Mark Atkinson

12.30 - 1400 Lunch 14.00 - 17.30 Session 2 Relevant archival policies/systems in the UK

* What are archives and why do they matter? Janet Foster - Freelance Archive Consultant * Dataset Archiving and Data Services at the National Data Repository Kevin Ashley - UK National Digital Archive of Datasets * Data archiving in current and future research projects Roger Stern - Statistical Services Centre, Reading University * Inventory and Data Retrieval - the view from NRI Trevor Abell - Natural Resources Institute * Archiving Census and Woodland Inventory Data Graham Bull - Woodland Surveys, Forestry Commission * Maintaining Forest Data for Future Use - A Commercial Perspective Paul Smyth - Huntings Technical Services * Volume/Biomass: Georeferenced Forest Volume Data for Tropical Countries Alessandro Baccini - Independent Consultant FAO2000 * Managing environmental data Geoffrey Collett - Environmental Information Centre, Monks Wood

FRIDAY 31 MARCH

0930 - 13.00 Session 3 Group work on defining the elements of an archival policy covering such issues as:

* central archival facility? * funding mechanisms * data storage methods/systems * IPR and repatriation Final discussions and summing up

13.00 Lunch and departure

 
Previous News
03-Apr-2006 Commonwealth Forestry Association
13-Mar-2006 R7277 - A 2 day workshop to consider archival policy and practice for historic and current tropical forest inventory data. Final Report.
02-Mar-2006 R7925 - Trading Forest Products can help make poverty history
28-Feb-2006 R7925 - Fair Trade in Wild Natural Resources
18-Jan-2006 Commercialization of non-timber forest products