Skip to content

Instantly share code, notes, and snippets.

@felnne
Last active September 14, 2022 10:14
Show Gist options
  • Save felnne/57b64396426bfe2ca641a91d7cf9e597 to your computer and use it in GitHub Desktop.
Save felnne/57b64396426bfe2ca641a91d7cf9e597 to your computer and use it in GitHub Desktop.
BAS Data Workflows - MAGIC submission

BAS Data Publishing Workflows

Index

Ideas, examples & other material from MAGIC relatng to joint PDC/MAGIC discussions on data publishing workflows.

Notes

Diagrams

Project specific

General

Flow Charts

Workflows

Presentations

BAS Data Publishing Workflows

General Notes (MAGIC)

Principles:

  • processes in workflows should be automatable
  • components in workflows should be interchangeable (defined interfaces, clear responsibilities etc.)
  • lower level components should be more durable (i.e. stores more durable than analytical tools)
  • common systems wherever possible (in the sense we should use the same things where possible)

Scope

In scope:

  • distribution & publication of resources

Out of scope:

  • editing & QA
  • analysis and integration (AI Ready?)
  • replication to remote locations & federation

Resource types in scope:

  • datasets
  • products
  • features

Resource types out of scope:

  • services
  • series & collections
  • projects & initiatives

Resources

Two types?:

  1. discrete/static - e.g. ADD releases, results of data collection
  2. continuous/dynamic - e.g. asset positions, met observations

Does this difference boil down to resource maintenance frequency?

BAS Data Publishing Workflows

Data Access Systems flowchart

Note: This is almost certainly out of scope for this exercise (as it's way beyond PDC/MAGIC). I've included in more for context and possible discussion in the future.

Resource Type Class Data Access System
Paper Published NORA
Publiction Published NORA
Report Published NORA
Paper Not Published MODES
Publiction Not Published MODES
Report Not Published MODES
Software Published - Modelling Related NERC Models Thing
Software Published - Not Modelling Related GitHub (@antarctica)
Software Not Published - Modelling Related GitLab
Software Not Published - Not Modelling Related GitLab
Datasets Restricted ?
Datasets Not Restricted PDC Ramadda
Features Restricted ?
Features Not Restricted ?
Products Restricted ?
Products Not Restricted Files.BAS

For restricted products/datasets:

BAS Data Publishing Workflows

workflows Summary

General Workflow

As a very high level, generic, workflow:

  1. generate resource
  2. generate metadata about resource
  3. generate distribution artefacts from resource
  4. deposit artefacts into a store
  5. link artefacts to metadata (via a data access system)
  6. publish artefacts & metadata (making them available to one or more audiences)

Specific Workflows

At a more grnaular level, there may be workflows for:

  • publishing a new unrestricted dataset
  • publishing a new embargoed dataset
  • publishing a new restricted dataset
  • publishing an updated unrestricted dataset
  • publishing an updated restricted dataset
  • lifting the embargo on a published dataset
  • publishing a new unrestricted feature
  • publishing a new restricted feature
  • publishing an updated unrestricted feature
  • publishing an updated restricted feature
  • publishing a new unrestricted product
  • publishing a new restricted product
  • publishing an updated unrestricted product
  • publishing an updated restricted product

Note: Though these workflows are distinct, their constituate parts should not be. I.e. These are separate receipes using mostly the same ingredients.

Specific examples with context

  • publishing a new unrestricted dataset:
    • publishing a frequently updated, unrestricted, data series (e.g. Sentinel 1 archive, multiple times per day)
    • publishing a periodically updated, unrestricted, data series (e.g. ADD, every 6 months)
  • publishing a new restricted dataset:
    • publishing a periodically updated, restricted, data series (e.g. Traverse tracks, every season)
  • publishing an updated unrestricted product
    • publishing a one-off, unrestricted, map product (e.g. A68 visualisation)
  • publishing an updated unrestricted product
    • publishing a periodically updated, restricted, map series (e.g. Air Ops Planning Maps, every season)

PDC use-cases:

  1. Apex embedded maps
  2. Data Portal web maps

Updating resources

For updating a resource, this can I think be broken down into two scenarios:

  1. correction of corrupt/missing information:
  • e.g. incorrectly encoded or only partial export
  • information actually replaced
  • note to this happening in metadata but no way to access original (possibly because corrupt)
  • only possible for unpublished information
  1. correction of incorrect/misleading information:
  • e.g. incorrect interpolation method used
  • information registered as a new revision
  • original information preserved in case analysis based on it (reproducibility)
  • metadata record for original information marked as superseded or withdrawn depending on significance

This can be summarised as:

Note: There was originally going to be a third case, for revising data due to new information. However in thinking this through, this should be treated as a new resource, with a series record or similar being the thing that's revisied.

@felnne
Copy link
Author

felnne commented Nov 23, 2021

bas-data-infrastructure-v0-1-0-Ex - BedMap3 drawio

@felnne
Copy link
Author

felnne commented Sep 14, 2022

bas-data-infrastructure-v0-2-0 drawio

@felnne
Copy link
Author

felnne commented Sep 14, 2022

bas-data-infrastructure-v0-3-0-MAGIC drawio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment