Publication standards for Code Ocean¶
This Standards checklist is intended to make the code publication standards set out in the Open Science Policy document clearer and more usable in the context of code hosted on Code Ocean. The Code review process documents how we will help each other meet these standards. The Additional best practices below go beyond the minimum standards to further promote reuse and reproducibility, and should be preferred when possible.
Standards checklist¶
Use this checklist on github
This link generator will let you open an issue on your capsule’s github repository with the checklist pre-filled. You can use this to track tasks or coordinate with a reviewer.
Capsules and repositories¶
Capsules (or pipelines) for all processing steps, from raw data to figures [3]
Working copy of capsule shared internally and linked to a public github repository within AIND or AIBS github organization
Released version of capsule added to manuscript collection (requires author and description in capsule metadata, sync to github, and reproducible run).
Reproducible run script generates all outputs[4] (if manual steps are unavoidable, include step-by-step instructions and automate as much as possible).
Figure outputs saved to
resultsfolder, with filenames indicating the corresponding figure number (and subpanel letter if possible).Code consolidated in
codefolder, with unused code removed or clearly documented.Explicitly specified (pinned) versions for all direct imports and other critical dependencies[5]
Data¶
All AIND data stored as external data assets (aind-open-data), with complete metadata
All intermediate results stored as external data assets (aind-open-data), with processing metadata added (Tutorial).
All data from external sources documented and downloadable with clear instructions from a stable data repository, or mirrored in aind-open-data.
If many individual assets are used, create combined data assets to organize them by data modality or type
All data assets (combined if needed) added to public collection – intermediate results should be included on a case-by-case basis.
Readme and other docs¶
Includes links to manuscript, github repo, and release capsule (add the latter before making a second release).[1]
Briefly describes all experimental data types and other inputs.
Briefly describes all non-figure outputs (intermediate results).
Briefly explains key analysis steps (reference relevant code by file or function names).
LICENSE file at top level of repo (MIT license).
Code review process¶
Reviewer will adopt the perspective of an external user, and check that code meets these standards and is reproducible and reusable (able to identify and adjust key parameters, not necessarily understand each line).
Reviewer must be a scientist who has not contributed to the capsule, but may be a manuscript author or someone otherwise knowledgeable about the general approach. A SciComp team member may be looped in for advise and oversight as necessary.
Required edits will typically consist of moving/renaming, documenting, commenting, and otherwise making code intelligible without much refactoring.
Review must be completed and issues addressed before biorxiv or other publicity.[6]
Reviewer should also offer suggestions for refactoring to match best practices in this document and resolve code style issues (using flake8 as a guide).[7]
Additional best practices¶
Ask SciComp to create an internal CO collection early in the manuscript process, keep it up to date as components are added, then make it public on biorxiv submission.
Consolidate capsules that process the same data type with compatible dependencies, and consider separating code between capsules if it has very different dependencies or hardware requirements
Whenever possible use a dockerfile generated from the environment builder, with minimal code in postinstall.
Remove any user- or editor-specific configuration, including untracked config files like the
.vscodedirectory.[1]Trim down any very large files (including notebooks), and consider amending the git history as well if the repository size is very large.
Where capsules depend on other capsules, link them in a pipeline
For manual steps, save outputs as derived assets that downstream steps can use as input for a reproducible run, and document in readme, metadata and paper methods.
For long-running steps, note approximate runtime in documentation; for very long steps, save outputs as derived assets also.
Separate reusable components of the code into functions in python modules.
Code used by multiple capsules should be moved to shared libraries.
Minimize manual postprocessing of figures (e.g. adjustments in Illustrator)