Research and even business is becoming a collaborative enterprise that brings together multiple institutions, sectors and, increasingly, different countries. No where is this more apparent than in the natural sciences where the phenomenon being examined and questions being asked are not contained in the borders of one discipline, institution, country, or continent. Both a reason for and often the purpose of collaboration in the sciences is the need to amass, maintain, and share large and diverse structured data resources that no one research team or institution has the resources or expertise to collect, make available, and maintain.
Such data-centric collaborations among researchers are providing profound and valuable benefits to the scientific enterprise and the general public, including:
Given the benefits of data-centric collaboration and sharing in the sciences, it is not surprising that organizational structures to facilitate this activity through the use of information technology are emerging. One such structure, called a collaborative data sharing network (CDSN), is being used to facilitate collaborations among dataset producers and users resulting in successful sharing of data and knowledge across traditional disciplinary, organizational, geographical, and political boundaries.
FIVE CHARACTERISTICS OF A COLLABORATIVE DATA SHARING NETWORK (CDSN)
A prime example of a CDSN is DataONE, a collaborative earth observational data sharing networks initiative supported by the National Science Foundation. DataONE is taking advantage of information and communication technologies to share data in a broader fashion than has been attempted in the past. It aims to ensure the preservation of, and access to, multi-scale, multi-discipline, and multi-national science data. DataONE is designed to transcend boundaries not only related to the field domains (e.g. biological and environmental), but also across organizational boundaries and, in the future, across national boundaries.
A collaborative network such as DataONE consists of various members with various capabilities and resources. Its proposed participants range from individual field research stations to governmental organization (e.g., USGS, NASA, EPA). DataONE classifies these participants into users and nodes based on the level of services and fees for participating. Users are participants who will have capability to access and store datasets with no fees and nodes are the institutional-based participants who, upon joining DataONE, will have the ability to store, distribute, and coordinate datasets. DataONE itself will act as coordinating nodes that will mediate and direct the information flows and manage the connection between different member nodes. These diverse participants have different capabilities in terms of knowledge, experience, and resources. DataONE aims to connect multiple data repositories, collected and preserved by various organizations without regard to size and location.
Notwithstanding the many benefits of data sharing, CDSNs such as DataONE face the same challenges of most data sharing initiatives. These challenges are embedded in social, legal, economic, and political factors and fall into four broad categories: technological, organizational, legal and policy barriers, and local context.
Technological barriers to data sharing exist when data sharing entities do not have compatible data architectures and technological infrastructures or consistent data definitions and standards. Data with different formats, definitions, content, and from multiple sources are difficult and costly to integrate into a single useable data repository or to improve so they are suitable for sharing.
Social, organizational, and economic barriers such as structural conflicts, managerial practices, lack of funding, institutionalized disincentives, and professional cultures can discourage data sharing. The intense competition in scientific fields may, for example, contribute to resistance to sharing data. Research about scientific data sharing has shown that fear for reputational damage if data is found to be faulty or lacking in some way is a deterrent to data sharing. Another deterrent is the lack of relevant resources to prepare data for sharing and to sustain sharing mechanisms. Scientists and institutions are not often recognized or rewarded for making datasets openly available and usually can’t spare the time or resources to prepare the labor-intensive documentation necessary to share data. Arranging for outside access and storage may involve lengthy and onerous negotiations or drawn-out administrative processes.
Legal and policy frameworks created by government, funding agencies, or other regulatory bodies often complicate the process of data sharing. Legal and policy mechanisms can create a paradoxical situation in relation to data sharing and may be the greatest obstacle in building a knowledge network. On the one hand, such frameworks can enhance data sharing by ensuring proper and accountable use of data and information as well as mandating the sharing of data. On the other hand, rigidity of policies and regulations, such as those designed to address privacy concerns, served to protect, can often inhibit data sharing. Unresolved legal issues have been found to deter or restrain collaboration, even if the scientists or institutions are prepared to proceed.
Local context, in the case of DataONE and the natural sciences, can create unique challenges to data sharing. Datasets in ecological research are complex, heterogeneous, and highly context dependent. Natural scientists usually pursue a specific question about a specific phenomenon at a specific site. Each subject might have different characteristics and require a different methodology. Data quality is highly correlated with the context underlying production, storing, and initially intended use. Using a diversity of data from multiple sources and contexts may lead scientists to question the data’s reliability and its research value or usability.
ORGANIZATIONAL SUPPORT & COMMITMENT FOR CDSN
Organizational support plays a major role in sharing research datasets, particularly considering the heterogeneity of collaborators and complexity of the data sharing process (Sayogo & Pardo, 2011; 2012). Analysis using a logistic regression and structural equation modeling technique of survey responses from 587 researchers found that organizational involvement is crucial for two reasons:
The study also found that organizational support significantly influences the intention of researchers to publish their datasets.
The success of CDSNs such as DataONE depends on the ability of many, if not most, of the participating entities to overcome the challenges described above. Success of a CSDN then requires new understanding of data sharing and calls attention to the following questions:
It is precisely these types of questions that CTG and others are trying to answer for DataONE and similar types of CDSNs. Through previous research, CTG modeled the complexity of data sharing initiatives including the interdependencies of technical and organizational capabilities and the relationship between those capabilities and successful data sharing. Building on this past research and new data on DataONE, four categories of capabilities continue to stand apart as critical to the success of a data sharing initiative.
CTG’s extensive work in cross-boundary information sharing and collaboration has consistently identified three factors as critical to the success of cross-boundary data sharing initiatives. Preliminary insights from scientific data sharing initiatives support these findings:
The sharing of research datasets is recognized as central to global efforts to advance science. Ensuring the success of sharing however, is a difficult and challenging endeavor that goes beyond a single knowledge domain, organization, or nation. Encouraging the sharing of datasets in a collaborative network in the interest of advancing science requires balancing expected benefits with identified challenges. If data sharing is conducted within a collaborative network such as DataONE, where the actors are autonomous, heterogeneous, and geographically dispersed, sharing is not purely based on personal decision but also affected by social and institutional arrangements. Looking at scientific data sharing through the lenses of CTG’s work in information sharing and collaboration provides new insights into how capability for data sharing in the scientific community can be created and advances in science enabled.
Djoko Sigit Sayogo, Graduate Assistant
Theresa Pardo, Director
Alan Kowlowitz, Senior Fellow