CorrosionAI
Request Demo

Corrosion Datasets for AI Research and Prediction

Structured, physics-validated datasets covering electrochemical, environmental, and material variables for machine learning-based corrosion prediction.

CorrosionAI Benchmark Datasets

Three curated benchmark datasets validated against physics-based corrosion models for PI-GNN training pipelines.

CorrosionAI-CO2-v3.2

CO2-driven corrosion rates in carbon steel under varying flow, temperature, and partial pressure conditions from laboratory and field measurements.

~5,000 validated data points
CSVParquetJSON
Last updated: 2025-Q4

CorrosionAI-MultiEnv-v2.1

Cross-environment corrosion degradation data: atmospheric, marine, soil-buried, and industrial chemical environments across ferrous and non-ferrous alloys.

~12,000 data points
CSVParquetJSON
Last updated: 2025-Q3

CorrosionAI-SensorCal-v1.4

Time-series field sensor data from 47 corrosion monitoring probe installations in operational oil, gas, and water injection systems.

~2 million readings
CSVParquet
Last updated: 2025-Q4

Public Corrosion Datasets for Research

Publicly available datasets commonly used in corrosion prediction research, evaluated for completeness and ML suitability.

DatasetSourceSizeKey Limitations
NIST Corrosion DataNIST~800 recordsLimited environmental variables; static snapshots only
Mendeley CO2 CorrosionMendeley Data500-2,000 recordsInconsistent variable naming; mixed units
Kaggle Corrosion ImagesKaggle10K-20K imagesImage classification only; not for rate regression
ICMT Ohio UniversityICMTProprietaryNot publicly downloadable; consortium access
ASTM G1/G31 ReferenceASTM InternationalVariesPaywalled; not machine-readable
UCI Steel Plates FaultsUCI ML Repository1,941 recordsManufacturing defects, not corrosion
MatNavi (NIMS Japan)NIMS~3,000 recordsJapanese exposure sites; registration required

Standard Variable Taxonomy

Canonical variable schema used by CorrosionAI for cross-dataset reproducibility and benchmarking.

Environmental

VariableUnitTypical Range
Temperature°C-10 to 200
CO2 Partial Pressurebar0.0 to 30.0
pH1.0 to 14.0
Dissolved Oxygenmg/L0.0 to 12.0
H2S Concentrationppm0 to 5,000

Flow

VariableUnitTypical Range
Flow Velocitym/s0.0 to 20.0
Wall Shear StressPa0.0 to 500.0

Material

VariableUnitTypical Range
Steel GradeCategoricalAPI 5L, AISI, UNS
Chromium Contentwt%0.0 to 30.0
Carbon Contentwt%0.01 to 1.5

Protection

VariableUnitTypical Range
Inhibitor TypeCategoricalFilm-forming, Neutralizing
Inhibitor Concentrationppm0 to 500

Target Variables

VariableUnitTypical Range
Corrosion Ratemm/year0.001 to 50.0
Pitting Ratemm/year0.0 to 20.0
Mass Lossg/m²0.0 to 5,000

Data Pipeline Architecture

Six-stage pipeline from raw data ingestion through production deployment.

1

Ingestion

Automated ETL from laboratory, field sensors, and published literature with unit harmonization and deduplication.

2

Preprocessing

Outlier detection (IQR + domain rules), missing value imputation (MICE), and thermodynamic consistency checks.

3

Feature Engineering

Dimensionless groups (Re, Sc, Sh), Pourbaix diagram encoding, and graph topology construction from reaction networks.

4

PI-GNN Training

Graph neural network training with physics loss terms: Arrhenius constraint, mass balance, Nernst equation.

5

Validation

K-fold cross-validation, out-of-distribution testing, and benchmarking against NORSOK M-506 and de Waard-Milliams.

6

Deployment

REST API, edge inference on monitoring hardware, continuous retraining with new field data.

Data Quality & Governance

Strict data governance ensuring reproducibility, regulatory compliance, and client confidentiality.

Data Lineage & Provenance

Every record carries full lineage metadata: source ID, collection method, ingestion timestamp, processing version, and quality flag with complete audit trail.

Anonymization & Privacy

K-anonymity (k≥5) on quasi-identifiers, differential privacy in aggregated statistics, and client data anonymization before entering shared training pools.

Version Control

Semantic versioning (MAJOR.MINOR.PATCH) with immutable data lake storage and full rollback capability. Every model training run logs the exact dataset version.

Frequently Asked Questions

Request Dataset Access

Access curated, physics-validated corrosion datasets for your AI research and prediction models.