SciOps

Next-Level OpenScience?

Joao O. Santos (FP-UL)

2023-03-28

Powered By

FP-UL logo

UL logo

QHELP logo

Erasmus+ logo

Acknowledgments

  • Prof. Sérgio (FP-UL)
  • Inês Ramalhete

Acknowledgments

  • André Vaz (FP-UL)
  • Philip Metz, former Education Evangelist at GitLab ( @Metzinaround)

Acknowledgments

  • Prof. Sara Hagá (FP-UL)
  • Prof. Leonel Garcia-Marques (FP-UL)

Acknowledgments

  • Prof. Barbara Sarnecka (UC Irvine)
  • Prof. Marcelo Camerlo (ICS-UL)

Acknowledgments

SciOps: Next-Level Open Science

  • Let’s start with the last part

Open Science

WARNINGS I

  • Manage your expectations: you’re in for a crazy unstructured ride!

  • No warranty: use at your own risk!

WARNINGS II

  • Some issues are controversial and you might strongly disagree with my opinions

  • Feel free to (strongly) disagree

  • Please refrain from insulting my mother

    • There’s little she could have done to stop this

PART I - Context

Research Ethics

  • Participants’ rights

  • Scientific Fraud

  • Questionable Research Practices

  • Epistemological values

Our Focus for Today

  • Questionable Recommended Research Practices

  • (Some) Epistemological values

    • Namely openness
    • Error-correcting settings (see Mayo, 1996)

Defining “Open”

  • What does it mean to you?

  • To me, in this context, it means:

    • Transparent
    • No barriers to entry

Defining “Science”

  • Is a matter for another day…

  • I love philosophy of science but I’ll spare you that lecture…

  • For today it suffices to say that today Science is done in teams

Diving Deeper

Open Science

  • Increase transparency

  • Share materials/resources

Open Science - Practices

  • Preregistration

  • Data Sharing

  • Supplementary Materials

  • Preprints

Open Science - Tools

Ok…Some Philosophy…

Epistemological Virtues

  • Reason/Logic

  • Valuing knowledge

  • Independence

  • Tolerance (see Santos, Hagá, & Garcia-Marques, in prep)

Issues

  • Fraud

  • Replication Crisis

  • Perverse Incentive Structures

  • Lack of Diversity

    • Commonly discussed (e.g., ethnic, abilities, age, demographic)
    • Less frequently discussed (e.g., political, religious, methodological)

Other Issues

  • Lack of good organizational policies and work ethic

  • Undocumented knowledge and procedures (makes onboarding hard)

  • Imposter syndrome

  • Loneliness

Ok…Some More Philosophy…

Epistemological Arguments

  • These issues can compromise the error-correction of scientific communites (see Mayo, 1996)

  • But we must also be careful not to compromise it with our intervention (Garcia-Marques)

PART II - Intermission

Introducing RUGGED

History I

  • Prof. Marcelo Camerlo suggestion of creating an R user group stuck with me…

  • When participating in a WriteOn group, coordinated by Sara Hagá at FP-UL, I though the model could be adapted to R

History II

  • After giving workshops on R we started grouping together on RUGGED

  • We’re on our second season now (see the episode guide)!

Platforms

Onboarding

  • If you want to join us follow these simple steps.

  • You don’t have to speak Portuguese, nor be affiliated with any Portuguese institution

  • But we do meet on Lisbon/GMT0 working hours…

Back To Your Regularly Schedule Programming

PART III - Open Science in Practice

What Can Open Science Do for Science?

  • Not much…
    • It cannot fix the perverse incentives (see Santos, Hagá, & Garcia-Marques, in prep)
  • But it can remove barriers if:
    • done in good faith
    • done right

Levels of Open Science

  • Not an evidence-based taxonomy, just a guide for discussion

Level -1

  • Scientific results are only published in (expensive) paid journals

  • Descriptions of methods and analysis are short

  • No (or very few) supplementary materials

No Judgment

  • There were good reasons to do so in the past:
    • Journals were mostly published in print (space issues)
    • The tooling was not mainstream

Level 0

  • Descriptions of methods and analysis are (mostly) enough to replicate

  • Open channels for communication

  • Papers are indexed and easier to search for

Level 1 - Actual Open Science

  • Supplementary materials:

    • Data sharing
    • Sharing analysis scripts
    • Sharing experimental files and materials
  • Open Access

  • Preprints

  • Modern tooling: osf.io, R/python, etc…

Level 1 - Some Notes

Data sharing must be done with respect to participants’ rights:

  • Consent forms should mention anonymised data might be shared

  • Data must be anonymised before being shared (error on the safe side)

  • Triple check the data is anonymised before putting it online

  • You can make the data anonymisation script public

Level 1 - Some Notes

If you’re sharing your files, try and make sure the names for the columns, variables, files, etc…, are meaningful to others

# This is not clear to anyone but yourself...
lr <- lm(P ~ A, datset)

# This is better
regression <- lm(Performance ~ Age, dataset)

# Or just use comments to explain the meaning
# Linear regression (`lr`) for the effect of age (`A`) on Performance (`P`)
lr <- lm(P ~ A, dataset)

Level 1 - Some Notes

If you’re going to share your analysis code make it portable:

# This is not portable!
setwd("/home/your_user_name/folders/that/exist/only/on/your/machine/dataset.csv")

# It could be made portable by using a predefined project structure
setwd("./project/stats/analyzes/")

# Importing data through an IDE's interface (e.g., RStudio) is not portable
# If the dataset variable appears without being defined first that's not portable
dataset$Age <- as.numeric(dataset$Age)

Level 1 - Some Notes

If you’re going to share your code try and clean it up:

  • But it’s better to share non-clean code than to not share code at all

Level 1 - Some Notes

# You viewed the variable and that was important in writing the script
# Still, it serves no purpose in the analysis so delete it.
View(dataset$Age)

# Same goes for finding the classes, dimensions of variables, etc...
class(dataset$Age)
dim(datset)

Level 1 - Some Notes

# Avoid tons of package imports when you only use one
library(afex)
library(dplyr)
library(effectsize)
library(tidyr)
library(WRS2)

# If I only use `dplyr` in that script then I should only have:
library(dplyr)
# If you don't know which packages you use and for what
# you should try and find that out

Level 1 - Some Notes

regression <- lm(DV ~ IV, dataset)

# Same goes for print statements of neeedless information
print(class(dataset$IV)
print(regression)
# You probably only want this last print statement
print(summary(regression))

# Better yet, save that to a file rather than printing it to stdout
sink("../results/regression_summary.txt")
print(summary(regression))
sink()

Next Level - SciOps

Disclaimer

  • This might be crazy

  • No warranty: use at your risk

  • I had little time to research and cite similar projects:

    • But I’ll try to link some references in the end

SciOps

  • Using cutting-edge computer science tools for Open Science

SciOps - Tooling

  • Revision control with git and modern code forges (e.g., gitlab.com, github.com)

  • Continuous Integration/Continuous Delivery or Deployment:

    • Pipelines of jobs/tasks that run every time files are updated
    • Automatic update and deployment of relevant artifacts/outputs
  • DevOps: Keeping track of dependencies and configurations with containers

SciOps - Workflow

  • TODO lists, notes, discussions are tracked per project

  • Changes and decisions are linked to discussions, TODOs and notes.

  • Collaboration can (but needs not) be done in the open

  • Files are licensed under Open Source licenses

SciOps - Potential

A new level of:

  • Transparency

  • Archival

  • Collaboration

  • Automation

SciOps - Today

gitlab.com/rggd/SciOps

gitlab.com/rggd/sciops_workshop

PART IV - Exercises

Exercise I

Level Up Your Open Science Game

Two Options

  • Quick (can be done now)

  • Requiring planning and commitment (can be planned now)

Quick

Quick: Examples

Quick: Examples

Quick: Examples

  • Start drafting a mock preregistration

  • Check your scripts for non-portable code

  • Clean up your scripts

Quick: Examples

  • Start drafting a longer “Procedure” or “Analysis” for your paper

  • Start taking notes of your decisions so far

  • Email your colleagues to start a community

Planning

Planning: How To

  1. Write down ideas

  2. Make a TODO list

  3. Sort TODOs:

    • Mark the ones that need to be done in order (blocking or blocked)
    • Mark the ones that are actionable
  4. Pick your next task

  5. Choose a commitment device

  • (See Allen, 2015)

Planning: Examples

What do I need to do to:

  • Make all my materials public?

  • Use programming for data analysis

  • Use Open Source software for my research (analysis and experiment running)

PART V - Outro

References

References

Debrief

  • All feedback is welcomed!

  • You can comment on this issue

  • You can email me any feedback you have

  • We can stay and chat after the session

Thank You

  • Thank you for being here!