Structured, physics-validated datasets covering electrochemical, environmental, and material variables for machine learning-based corrosion prediction.
Three curated benchmark datasets validated against physics-based corrosion models for PI-GNN training pipelines.
CO2-driven corrosion rates in carbon steel under varying flow, temperature, and partial pressure conditions from laboratory and field measurements.
Cross-environment corrosion degradation data: atmospheric, marine, soil-buried, and industrial chemical environments across ferrous and non-ferrous alloys.
Time-series field sensor data from 47 corrosion monitoring probe installations in operational oil, gas, and water injection systems.
Publicly available datasets commonly used in corrosion prediction research, evaluated for completeness and ML suitability.
Canonical variable schema used by CorrosionAI for cross-dataset reproducibility and benchmarking.
| Variable | Unit | Typical Range |
|---|---|---|
| Temperature | °C | -10 to 200 |
| CO2 Partial Pressure | bar | 0.0 to 30.0 |
| pH | — | 1.0 to 14.0 |
| Dissolved Oxygen | mg/L | 0.0 to 12.0 |
| H2S Concentration | ppm | 0 to 5,000 |
| Variable | Unit | Typical Range |
|---|---|---|
| Flow Velocity | m/s | 0.0 to 20.0 |
| Wall Shear Stress | Pa | 0.0 to 500.0 |
| Variable | Unit | Typical Range |
|---|---|---|
| Steel Grade | Categorical | API 5L, AISI, UNS |
| Chromium Content | wt% | 0.0 to 30.0 |
| Carbon Content | wt% | 0.01 to 1.5 |
| Variable | Unit | Typical Range |
|---|---|---|
| Inhibitor Type | Categorical | Film-forming, Neutralizing |
| Inhibitor Concentration | ppm | 0 to 500 |
| Variable | Unit | Typical Range |
|---|---|---|
| Corrosion Rate | mm/year | 0.001 to 50.0 |
| Pitting Rate | mm/year | 0.0 to 20.0 |
| Mass Loss | g/m² | 0.0 to 5,000 |
Six-stage pipeline from raw data ingestion through production deployment.
Automated ETL from laboratory, field sensors, and published literature with unit harmonization and deduplication.
Outlier detection (IQR + domain rules), missing value imputation (MICE), and thermodynamic consistency checks.
Dimensionless groups (Re, Sc, Sh), Pourbaix diagram encoding, and graph topology construction from reaction networks.
Graph neural network training with physics loss terms: Arrhenius constraint, mass balance, Nernst equation.
K-fold cross-validation, out-of-distribution testing, and benchmarking against NORSOK M-506 and de Waard-Milliams.
REST API, edge inference on monitoring hardware, continuous retraining with new field data.
Strict data governance ensuring reproducibility, regulatory compliance, and client confidentiality.
Every record carries full lineage metadata: source ID, collection method, ingestion timestamp, processing version, and quality flag with complete audit trail.
K-anonymity (k≥5) on quasi-identifiers, differential privacy in aggregated statistics, and client data anonymization before entering shared training pools.
Semantic versioning (MAJOR.MINOR.PATCH) with immutable data lake storage and full rollback capability. Every model training run logs the exact dataset version.