9 Considerations to Improve Your Data Management for Research

Data management is a multi-billion dollar industry with stiff competition and an often confusing landscape. Although an expansion in the industry has given way to a period of contraction and consolidation, the ecosystem is constantly evolving and continues to change rapidly. Mergers, acquisitions and moves all have an impact on the tools and platforms used to manage information. New hardware and software tools can quickly transform data management.

For researchers, the process of gathering information to formulate a hypothesis, conduct experiments, or analyze and iterate on a research program can be a daunting task. The challenge becomes worse when the use of advanced technology and big data is included, and it only becomes more difficult with the increased pressure of regulations and security constraints.

To meet these challenges, research-driven organizations must take a strategic approach to data management. But what are the best practices for managing data in a landscape of ever-changing approaches, tools, and threats?

Data management can be broken down into a set of interconnected components. Taken together, these components provide a structure to help various stakeholders – data engineers, data scientists, IT operations staff, data users – understand how the evolution of data management is impacting business. how research is constructed and conducted, the skills required for data users, and what may be on the horizon for the data management ecosystem.

We have identified nine key pieces of this puzzle:

  • Data movement
  • Data locality
  • Metadata management
  • Data integration
  • Research Capabilities
  • Data Catalog (s)
  • Data pipeline (s)
  • Politics and governance
  • Intrinsic Safety and Trust

Organizations should carefully consider how they approach these various components as part of their data management strategy to enable the business to research effectively, drive efficiencies, and protect all data as that valuable asset. Read on for an overview of some components or check out the full white paper, “Data management for researchBy Adam Robyak and Dr Jeffrey Lancaster.

Data movement. There are a few trends that are likely to have an impact on how the movement of data evolves over the next few years. First, companies are adopting hybrid cloud environments where data is stored both in on-premises infrastructure as well as at cloud providers, on remote devices, in sensors and on edge gateways in addition to services. on-premise and cloud. As researchers seek to use this data, it will need to be both accessible and secure, regardless of where it is stored. Second, machine learning is increasingly being used to automate manual tasks that were previously the responsibility of IT professionals. As a result, these IT professionals can expect to spend less time on rote processes and more time monitoring resource allocation and troubleshooting remotely.

Data locality. Whether that data is generated and stored in the cloud, in a data center, at the edge, or somewhere in between, understanding where the data resides is essential to any data management strategy. Additionally, advanced computing is a more recent consideration that has emerged in response to decentralized computing, Web 3.0, and disaggregated data where the computational advantage lies in preprocessing the data, so that only the data key, aggregated data or pre-analyzed data is transmitted from the return to a data center. And in some cases, the data doesn’t need to make a round trip to a data center; it can be fully processed at the edge. Cutting-edge computing can be used for a range of applications, from AI and analytics to inference and localized learning. Edge systems can also provide data aggregation from multiple endpoints and they can act as relays or nodes in a distributed network.

Data pipe (s). Data pipelines provide an organized and often efficient construction for the delivery of information from the data source to the destination. Pipelines should be automated where possible and can leverage machine learning and artificial intelligence to facilitate sourcing and ingestion. To get the most out of data pipelines, researchers need to be able to clearly explain where, when, and how data is collected. Multiple data pipelines are likely to be employed by researchers and organizations that have a mature data management strategy.

Politics and governance. Policy and governance also led to expect researchers to have a data management plan. The National Science Foundation and the National Institutes of Health, along with other federal agencies in the United States, require the inclusion of a data management plan as part of grant applications. Universities and colleges thus assume responsibility for the good stewardship of the data generated by the research enterprise. The burden on institutions continues to increase as the amount of research data for which they are responsible grows exponentially.

Intrinsic safety and trust. Confidence gaps associated with current solutions present an opportunity for new and emerging technologies: the Internet of Things is secure through a combination of edge data collection and processing and telemetry; data provenance solutions ensure data accuracy and legitimacy, even for physical items purchased through complex supply chains; Data security on hybrid cloud models protects data in transit. Even SecDevOps – the process of integrating security, development, and IT operations into a contiguous and cohesive lifecycle management architecture – is a sign of the attention and importance placed on the need for trust in the business. Data managment.

By deconstructing the components of a data management strategy, researchers can ensure that they are both responsible stewards of data and that they are using the best emerging technologies. While the responsibility does not lie entirely with researchers – it must be shared by research administrators, students and others – it is only through the collaborative cooperation of researchers, organizations and IT operations that the Optimal implementation of a data management strategy can be achieved for research.

For a more in-depth look at each of the components of a successful data management strategy, see the Dell Technologies white paper “Data management for researchBy Adam Robyak and Dr Jeffrey Lancaster.

Copyright © 2021 IDG Communications, Inc.

Comments are closed.