Skip to main content

Glossary of Data-Related Terms

Common Terms Used to Discuss, Describe, Define and Manage Data

Introduction

Enterprise Data Management is a relatively new endeavor within government institutions. As such, it is important when talking about data management to understand the context for the commonly used terms and eliminate whenever possible any ambiguity that could arise within day-to-day communications. This glossary of terms is designed to provide a foundational vocabulary for anyone new to the discipline of data management. While it is not exhaustive of all terms used in this context, it does include those terms that are used most, and perhaps understood least.

TermDefinition
Access Control The process of granting or denying specific requests for obtaining and using information and related information processing services. NIST SP 800-53, FIPS 201-2
Active Data Dictionary Repository for storing dynamically accessible and modifiable information relating to midrange-system data definitions and descriptions. Gartner IT Glossary
Advanced Analytics A form of BI in which analytics professionals carry out a detailed, code-driven investigation. Characterized by exploratory data analysis, statistical models, and machine learning.
API An Application Programming Interface, which is a set of definitions of the ways one piece of computer software communicates with another. It is a method of achieving abstraction, usually (but not necessarily) between higher-level and lower-level software. Resources.data.gov - a repository of federal enterprise data resources
Application A software program hosted by an information system. NIST SP 800-52, 800-37
Application Data Management (ADM) A technology-enabled business discipline in which business and IT work together to ensure the uniformity, accuracy, stewardship, governance, semantic consistency, and accountability for data in a business application or suite, such as ERP, custom-made or core banking. Application data is the consistent and uniform set of identifiers and extended attributes maintained and/or used within an application or suite. Gartner IT Glossary
Audit Trail Data in the form of a logical path linking a sequence of events, used to trace the transactions that have affected the contents of a record. FDA Glossary of Computer System Software Development Terminology (8/95)
Big Data High-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation. Gartner IT Glossary
Big Query Google Cloud Platform (GCP) scalable, managed enterprise data warehouse for analytics.
Blockchain A blockchain is an expanding list of cryptographically signed, irrevocable transactional records shared by all participants in a network. Each record contains a time stamp and reference links to previous transactions. With this information, anyone with access rights can trace back a transactional event, at any point in its history, belonging to any participant. A blockchain is one architectural design of the broader concept of distributed ledgers.
Business Glossary A compendium of business terms and definitions, which have been approved by stakeholders and are maintained and governed. The language representing the data should be aligned with the language of the business. The Office of the National Coordinator for Health Information Technology (ONC)
Business Intelligence Business intelligence (BI) refers to capabilities that enable organizations to make better decisions, take informed actions, and implement more-efficient business processes.
Business Intelligence (BI) Platforms Software platforms that enable enterprises to build BI applications by providing capabilities in three categories: analysis, such as online analytical processing (OLAP); information delivery, such as reports and dashboards; and platform integration, such as BI metadata management and a development environment.
Business Metadata Descriptive information employed to understand, locate, search, and control content. It can include elements such as terms and definitions (i.e., the Business Glossary), values, authors, keywords, and publishers. The Office of the National Coordinator for Health Information Technology (ONC)
Change Control The processes, authorities for, and procedures to be used for all changes that are made to the computerized system and/or the system's data; a vital subset of the Quality Assurance program and should be clearly described in the establishment's SOPs. FDA Glossary of Computer System Software Development Terminology (8/95)
Clinical Data Repository (CDR) An aggregation of granular patient-centric health data usually collected from multiple-source IT systems and intended to support multiple uses. Gartner IT Glossary
Collection The acquisition of information and the provision of this information to processing elements. USG Glossary - DOD, DOD Dictionary, JP 2-01
Content Management (CM) A broad term referring to applications and processes to manage Web content, document content and e-commerce-focused content.
Continuous Data Protection (CDP) An approach to recovery that continuously, or nearly continuously, captures and transmits changes to files or blocks of data while journaling these changes. Gartner IT Glossary
Critical [Data] Element An element of an entity or object that enables it to perform its primary function. USG Glossary - DOD, DOD Dictionary, JP 3-60
CRUD (Create, Read, Update, Delete) The four basic functions of persistent storage; Guidelines for defining how different people or communities within an organization deal with data elements owned by the organization. Gartner IT Glossary
Dark Data The information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Gartner IT Glossary
Dashboard A centralized, interactive, and visual display of data used to monitor conditions or facilitate understanding.
Data Factual information, especially information organized for analysis or used to reason or make decisions. In Computer Science, numerical or other information represented in a form suitable for processing by computer. USG Glossary - White House, OMB, Circular No. A-16 Revised
Data Administrator One who manages access, security, and integrity of the database and monitors the performance of the database system to maintain any established service level agreements. USG Glossary - DOS/ USAID, FAM, 5 FAM 631.1
Data Administration The organization responsible for the definition, management, organization, and supervision of data within an enterprise or organization. A business function responsible for identifying, documenting, and modeling business information requirements. USG Glossary - DOS/USAID, FAM, 1 FAM 271-4
Data Analytics Interpretation of information in context, typically through use of statistical measures, data models, reports, and dashboards.
Data and Analytics The management of data for all uses (operational and analytical) and the analysis of data to drive business processes and improve business outcomes through more effective decision making and enhanced customer experiences. Gartner IT Glossary
Data Architecture Architectural framework for how data is stored, managed, and used in a system; describes how data is persistently stored, how components and processes reference and manipulate this data, how external/ legacy systems access the data. DHS Lexicon Terms and Definitions
Data Archiving The set of practices around the storage and monitoring of the state of digital material over the years. W3C
Data Asset Any entity that is comprised of data; may be a system or application output file, database, document, or web page; also includes a service that may be provided to access data from an application. NIST CNSSI 4009-2015
Data Breach The loss, theft, or other unauthorized access to data containing sensitive personal information, in electronic or printed form, that results in the potential compromise of the confidentiality or integrity of the data. USG Glossary - DVA, US Code 38, §5727
Data Catalog An organized inventory of data assets in the organization. It uses metadata to help organizations manage their data. It also helps data professionals collect, organize, access, and enrich metadata to support data discovery and governance. Oracle
Data Classification The assignment of a level of sensitivity to data that results in the specification of controls for each level of classification. Levels are assigned according to predefined categories as data are created, amended, enhanced, stored or transmitted. Information Systems Audit and Control Association (ISACA)
Data Corruption A violation of data integrity. FDA Glossary of Computer System Software Development Terminology (8/95)
Data Custodian An IT professional working with data systems to ensure data is stored, transported, and accessed appropriately.
Data Deduplication A form of compression that eliminates redundant data on a subfile level, improving storage utilization. In this process, only one copy of the data is stored; all the redundant data will be eliminated, leaving only a pointer to the previous copy of the data. Gartner IT Glossary
Data Dictionary A collection of information about data such as name, description, creator, owner, provenance, translation in different languages, and usage. ISO/IEC 25024:2015
Data Element A basic unit of information that has a unique meaning and subcategories (data items) of distinct value. Examples of data elements include gender, race, and geographic location. CNSSI 4009-2015 NIST SP 800-47
Data Element Standardization The process of documenting, reviewing, and approving unique names, definitions, characteristics, and representations of data elements according to established procedures and conventions. USG Glossary - DOS/ USAID, FAH, 5 FAH?5 H?111.5
Data Flow The flow of data from the input to output. Data flow includes travel through the communication lines, routers, switches, and firewalls as well as processing through various applications on servers. Information Systems Audit and Control Association (ISACA)
Data Governance A set of processes that ensures that data assets are formally managed throughout the enterprise. A data governance model establishes authority and management and decision-making parameters related to the data produced or managed by the enterprise. NIST CNSSI 4009-2015 NSA/CSS Policy 11-1
Data Integration The process of retrieving data from multiple source systems and combining it in such a way that it can yield consistent, comprehensive, current, and correct information for business reporting and analysis. Gartner IT Glossary
Data Integrity A property whereby data has not been altered in an unauthorized manner since it was created, transmitted, or stored. NIST SP 800-57 Part 1 Rev. 4
Data Interoperability Interoperability concerning the creation, meaning, computation, use, transfer, and exchange of data. ISO/IEC 20944-1:2013
Data Item A named component of a data element; usually the smallest element. FDA Glossary of Computer System Software Development Terminology
Data Lake A concept consisting of a collection of storage instances of various data assets. These assets are stored in a near-exact, or even exact, copy of the source format and are in addition to the originating data stores. Gartner IT Glossary
Data Life Cycle The sequence of stages that a particular unit of data goes through from its initial generation or capture to its eventual archival and/or deletion at the end of its useful life.
Data Lineage Data lineage is the journey data takes from its creation through its transformations over time. It describes a certain dataset’s origin, movement, characteristics and quality. Erwin
Data Literacy The ability to read, write, and communicate data in context, with an understanding of the data sources and constructs, analytical methods and techniques applied, and the ability to describe the use case application and resulting business value or outcome. Gartner IT Glossary
Data Loss Protection A set of technologies and inspection techniques used to classify information content contained within an object — such as a file, email, packet, application or data store — while at rest (in storage), in use (during an operation) or in transit (across a network). Gartner IT Glossary
Data Management The practice of putting into place policies, procedures, and best practices to ensure that data is understandable, trusted, visible, accessible and interoperable. DHS Lexicon Terms and Definitions
Data Mapping A method used to identify and link selected data to one or more equivalent standard data elements. DOS/ USAID, FAM, 5 FAM 613
Data Mining An analytical process that attempts to find correlations or patterns in large data sets for the purpose of data or knowledge discovery. NIST SP 800-53
Data Modeling Identifies informal graphical and textual representation and the entities and relationships involved in a data process; provides a mechanism for understanding the intended activity of a new system and designing the data. DOS/ USAID, FAM, 5 FAM 613
DataOps DataOps is the hub for collecting and distributing data, with a mandate to provide controlled access to systems of record for customer and marketing performance data, while protecting privacy, usage restrictions and data integrity.
Data Owner The individual(s), normally a manager or director, who has responsibility for the integrity, accurate reporting and use of computerized data. Information Systems Audit and Control Association
Data Profiling A technology for discovering and investigating data quality issues, such as duplication, lack of consistency, and lack of accuracy and completeness. This is accomplished by analyzing one or multiple data sources and collecting metadata that shows the condition of the data and enables the data steward to investigate the origin of data errors. The tools provide data statistics, such as degree of duplication and ratios of attribute values, both in tabular and graphical formats. Gartner IT Glossary
Data Protection The implementation of appropriate administrative, technical, or physical means to guard against unauthorized intentional or accidental disclosure, modification, or destruction of data. ISO/IEC 2382-1:1993
Data Quality The planning, implementation, and control of activities that apply quality management techniques to data, in order to assure it is fit for consumption and meet the needs of data consumers. DMBOK (via Dataversity)
Data Reference Model A representational framework whose primary purpose is to enable information sharing and reuse across all levels via the standard description and discovery of common data also for the promotion of uniform data management practices. DHS Lexicon Terms and Definitions
Data Replication The data replication segment includes a set of data replication products that reside in the disk array controller, in a device in the storage network or on a server. Included are local and remote replication products, migration tools, and disk imaging products. Also included are replication products specifically targeted as an alternative to backup applications. Not included are database replication products, log-based DBMS replication products or application-based replication products. Gartner IT Glossary
Data Security Physical, technical, and administrative measures used to safeguard protected information from unauthorized access, modification, use, disclosure, or destruction. USG Glossary - DOS/ USAID, FAM, 5 FAM 763.1?4
Data Set A collection of related records. FDA.gov Glossary of Computer System Software Development Terminology (8/95)
Data Standards The rules by which data are described and recorded. In order to share, exchange, and understand data, we must standardize the format as well as the meaning. USGS
Data Steward One who oversees and maintains consistent reference data and master data definitions, publishes relevant interpretation and proper usage of the data, and ensures the quality of the content and metadata. USG Glossary - DOS/ USAID, FAM, 5 FAM 631.1
Data Stewardship The most common label to describe accountability and responsibility for data and processes that ensure effective control and use of data assets. DAMA DMBOK
Data Strategy A highly dynamic process employed to support the acquisition, organization, analysis, and delivery of data in support of business objectives. Gartner IT Glossary
Data Structure The relationships among files in a database and among data items within each file. ISACA
Data Validation A process used to determine if data are inaccurate, incomplete, or unreasonable; the checking of data for correctness or compliance with applicable standards, rules, and conventions. FDA.gov Glossary of Computer System Software Development Terminology (8/95)
Data Virtualization Enables distributed databases, as well as multiple heterogeneous data stores, to be accessed and viewed as a single database. Data Virtualization servers perform data extract, transform and integrate virtually. DMBOK
Data Visualization A way to represent information graphically, highlighting patterns and trends in data and helping the reader to achieve quick insights. Gartner IT Glossary
Data Warehouse A storage architecture designed to hold data extracted from transaction systems, operational data stores and external sources. It then combines that data in an aggregate, summary form for data analysis and reporting for predefined business needs. Gartner IT Glossary
Data Wiping The process of logically removing data from a read/write medium so that it can no longer be read. Performed externally by physically connecting storage media to a hardware bulk-wiping device, or internally by booting a PC from a CD or network, it is a nondestructive process that enables the medium to be safely reused without loss of storage capacity or leakage of data. Gartner IT Glossary
Database A collection of data organized according to a conceptual structure describing the characteristics of the data and the relationships among their corresponding entities, supporting one or more application areas. ISO/IEC 2382:2015
Descriptive Analytics The examination of data, usually manually performed, to answer the question “What happened?” (or What is happening?), characterized by traditional business intelligence (BI) and visualizations such as pie charts, bar charts, line graphs, tables, etc. Gartner IT Glossary
Diagnostic Analytics A form of advanced analytics which examines data or content to answer the question “Why did it happen?” and is characterized by techniques such as drill-down, data discovery, data mining and correlations. Gartner IT Glossary
Digital Transformation Digital transformation can refer to anything from IT modernization (for example, cloud computing), to digital optimization, to the invention of new digital business models. The term is widely used in public-sector organizations to refer to modest initiatives such as putting services online or legacy modernization. Thus, the term is more like “digitization” than “digital business transformation.”
Document Management (DM) A function in which applications or middleware perform data management tasks tailored for typical unstructured documents (including compound documents). It may also be used to manage the flow of documents through their life cycles.
Dynamic Data Masking (DDM) An emerging technology that aims at real-time data masking of production data. DDM changes the data stream so that the data requester does not get access to the sensitive data, while no physical changes to the original production data take place. Gartner IT Glossary
Enterprise Data The sum of all data collected, created, used, managed, maintained, shared, and stored by entities and programs that warrants stewardship by the appropriate data stewards from an enterprise perspective. DHS Lexicon Terms and Definitions
Enterprise Architecture A strategic information asset base which defines the mission, the information, technologies, and the transitional processes for implementing new technologies in response to changing mission needs. NIST SP 800-53, OMB A-130
Enterprise Content Management Used to create, store, distribute, discover, archive, and manage unstructured content (such as scanned documents, email, etc.) and analyze usage to enable organizations to deliver relevant content to users where and when they need it. Gartner IT Glossary
Enterprise Information Management (EIM) An integrative discipline for structuring, describing and governing information assets across organizational and technological boundaries to improve efficiency, promote transparency and enable business insight.
Enterprise Metadata Management The business discipline for managing the metadata about the information assets of the organization. Gartner IT Glossary
ETL (Extraction, Transformation, Loading) A data pipeline used to collect data from various sources, transform the data according to business rules, and load it into a destination data store. Microsoft
Google Cloud Platform (GCP) Google Cloud Platform provides infrastructure as a service, platform as a service, and serverless computing environments, in addition to a multitude of cloud service offerings.
Google Data Studio Google’s entry-level, cloud-based visualization and dashboard platform. Part of GCP.
Information Life Cycle Information life cycle, as defined in OMB Circular A-130, means the stages through which information passes, typically characterized as creation or collection, processing, dissemination, use, storage, and disposition.
Information Management The function of managing an organization’s information resources for the handling of data and information acquired by one or many different systems, individuals, and organizations in a way that optimizes access by all who have a share in that data or a right to that information.
Information Sharing Exchange between entities or persons of data, information or knowledge stored within discrete information systems or created spontaneously using collaborative communication technologies includes transmission, communication, or any type of disclosure or receipt of information as well as any provision or receipt of account access to a dataset or data repository.
Interoperability The capability to communicate, execute programs, or transfer data among various functional units in a manner that requires the user to have little or no knowledge of the unique characteristics of those units. ISO/IEC 2382-1:1993
Looker Google’s fully-featured, cloud-based visualization and BI report platform. Part of GCP, and intended for scalable, massively parallel relational database management systems.
Master Data Data held by an organization that describes the entities that are both independent and fundamental for an enterprise that it needs to reference in order to perform its transaction. ISO/IEC 25024:2015
Metadata Information describing the characteristics of data including, for example, structural metadata describing data structures (e.g., data format, syntax, and semantics) and descriptive metadata describing data contents (e.g., information security labels). CNSSI 4009-2015 NIST SP 800-53 Rev. 4
Metadata Interoperability Interoperability concerning the creation, meaning, computation, use, transfer, and exchange of descriptive data. ISO/IEC 20944-1:2013
Metadata Repository A compendium of data asset knowledge, typically compiled and enhanced over time in manageable phases.
Microsoft Azure Microsoft Azure provides infrastructure as a service, platform as a service, and serverless computing environments, in addition to a multitude of cloud service offerings.
Open Data Public data that are made available consistent with relevant privacy, confidentiality, security, and other valid access, use, dissemination restrictions, and structured in a way that enables the data to be fully discoverable and usable by end users. White House, OMB, Circular A?130
PII (Personally Identifiable Information) Any representation of information that permits the identity of an individual to whom the information applies to be reasonably inferred by either direct or indirect means. NIST SP 800-79-2
Power BI Microsoft’s visualization and BI report platform. Part of the wider Power Platform line of tools for BI, automation, app development, and app connectivity.
Predictive Analytics A form of advanced analytics which examines data to answer the question “What is going to happen?” or more precisely, “What is likely to happen?”, and is characterized by techniques such as regression analysis,, predictive modeling, and forecasting. Gartner IT Glossary
Prescriptive Analytics Advanced analytics which examines data or content to answer the question “What should be done?” or “What can we do to make _______ happen?”, and is characterized by techniques such as graph analysis, simulation, complex event processing. Gartner IT Glossary
Python An open-source interpreted high-level general-purpose programming language. In the context of data analytics, Python is an industry standard language for carrying out mathematical operations, data cleansing, data transformation, data visualization, data modeling, and data mining tasks thanks to a wide and well-supported ecosystem of libraries.
R An open-source programming language built for carrying out mathematical operations and data mining.
Real-World Data Usage and transaction data generated from direct measurement of business processes, logs, and hardware. Gartner IT Glossary
Record A group of related data elements treated as a unit. [A data element (field) is a component of a record, a record is a component of a file (database)]. FDA.gov Glossary of Computer System Software Development Terminology (8/95)
Records Management The process for tagging information for records keeping requirements as mandated in the Federal Records Act and the National Archival and Records Requirements. NIST CNSSI 4009-2015
Relational Database A database in which the data are organized according to a relational model. ISO/IEC 2382-17
Relational Database Management System A management system for relational database. In order to use relational data base management systems, it is necessary to represent relational model of data that organizes data with specific characteristics (tables or relations, unique key, etc.) ISO/IEC 25024:2015
Report A static document, table, or visualization that gathers data into one place and presents it visually.
Repository A database service capable of storing information, such as certificates and CRLs, allowing unauthenticated information retrieval. Repositories include, but are not limited to, directory services. NIST SP 800-15
Retention Period The minimum amount of time that a key or other cryptographically related information should be retained in an archive. NIST SP 800-57 Part 1 Rev. 4
Risk Management The process of managing risks to organizational operations (including mission, functions, image, or reputation), organizational assets, or individuals resulting from the operation of an information system. NIST FIPS 200
Self-Service Analytics A form of BI in which line-of-business professionals are enabled and encouraged to perform queries and generate reports on their own with nominal IT support. Characterized by simple-to-use BI tools, dashboards, and use of aliasing and semantic layers to make data easier to interpret.
Sensitive Data Any designated data or metadata that is used in limited ways and/or intended for limited audiences; may include personal data, corporate or government data, and mishandling of published sensitive data may lead to damages to individuals or organizations. W3C
Standard A standard is a document that provides requirements, specifications, guidelines or characteristics that can be used consistently to ensure that materials, products, processes and services are fit for their purpose. ISO
Structured Data Refers to data that conforms to a fixed schema. Relational databases and spreadsheets are examples of structured data. W3C
Target Data The data that is ultimately to be protected (e.g., a key or other sensitive data). NIST SP 800-133
Technical Metadata Descriptive information about data stored in physical databases, as well as its transformations through automated processes.
Unstructured Data Data that is more free form, such as multimedia files, images, sound files, or unstructured text. Unstructured data does not necessarily follow any format or hierarchical sequence, nor does it follow any relational rules. Resources.data.gov

References

Back to Top