CASE 03 / 05
AI Data Platform · Pipeline
overview.md
Indian ocean data lives in incompatible silos — CMLRE records, Angria Bank surveys, marine-mammal sightings — each with its own format and quirks. OceanIQ is the pipeline that ingests those fragments, reconciles them, and lands a single clean schema you can actually query, built around a business case study on ocean-data accessibility.
problem.txt
The data exists, but it's unusable: different formats, units, naming, and granularity per source. Any analysis starts with weeks of manual cleaning. The goal was to make 'ask a question of all of it at once' possible.
architecture.drawio
dataset.csv
Heterogeneous public Indian oceanographic sources — CMLRE datasets, Angria Bank surveys, and marine-mammal sighting records — each arriving in its own format and resolution.
| source | type | challenge |
|---|---|---|
| CMLRE | tabular records | naming + units |
| Angria Bank | survey data | granularity |
| Marine mammals | sightings | sparse / irregular |
challenges.log
lessons-learned.md
future-work.md