← work · all projects
oceaniq.md

CASE 03 / 05

OceanIQ

AI Data Platform · Pipeline

MVP2025
unified schema diagram
3+
sources unified
1 schema
output
case study
built for

overview.md

Indian ocean data lives in incompatible silos — CMLRE records, Angria Bank surveys, marine-mammal sightings — each with its own format and quirks. OceanIQ is the pipeline that ingests those fragments, reconciles them, and lands a single clean schema you can actually query, built around a business case study on ocean-data accessibility.

problem.txt

The data exists, but it's unusable: different formats, units, naming, and granularity per source. Any analysis starts with weeks of manual cleaning. The goal was to make 'ask a question of all of it at once' possible.

architecture.drawio

  1. Ingest — readers for each source (CMLRE, Angria Bank, marine-mammal surveys).
  2. Normalise — units, naming, and types are standardised per field.
  3. Reconcile — records are aligned and merged with AI-assisted matching.
  4. Land — everything resolves into one analysis-ready schema.

dataset.csv

Heterogeneous public Indian oceanographic sources — CMLRE datasets, Angria Bank surveys, and marine-mammal sighting records — each arriving in its own format and resolution.

sourcetypechallenge
CMLREtabular recordsnaming + units
Angria Banksurvey datagranularity
Marine mammalssightingssparse / irregular

challenges.log

  • No shared key across sources — reconciliation had to be inferred.
  • Silent format drift inside a single 'source' over time.
  • Deciding what 'clean' means when ground truth itself is messy.

lessons-learned.md

  • Schema design is the real product — the model is downstream of it.
  • AI-assisted cleaning saves time but needs guardrails and spot-checks.
  • A use case (the case study) keeps a data project from boiling the ocean.

future-work.md

  • Incremental ingestion as new survey data lands.
  • A query UI on top of the unified schema.
  • Data-quality scoring per record.
Aditya Dixit · Jaipur, IndiaSet in IBM Plex Serif & Mono© 2026