SciOps

Next-Level OpenScience?

Joao O. Santos (FP-UL)

2023-03-28

Powered By

FP-UL logo

UL logo

QHELP logo

Erasmus+ logo

Acknowledgments

Prof. Sérgio (FP-UL)
Inês Ramalhete

Acknowledgments

André Vaz (FP-UL)
Philip Metz, former Education Evangelist at GitLab ( @Metzinaround)

Acknowledgments

Prof. Sara Hagá (FP-UL)
Prof. Leonel Garcia-Marques (FP-UL)

Acknowledgments

Prof. Barbara Sarnecka (UC Irvine)
Prof. Marcelo Camerlo (ICS-UL)

Acknowledgments

RUGGED

SciOps: Next-Level Open Science

Let’s start with the last part

Open Science

WARNINGS I

Manage your expectations: you’re in for a crazy unstructured ride!
No warranty: use at your own risk!

WARNINGS II

Some issues are controversial and you might strongly disagree with my opinions
Feel free to (strongly) disagree
Please refrain from insulting my mother
- There’s little she could have done to stop this

PART I - Context

Research Ethics

Participants’ rights
Scientific Fraud
Questionable Research Practices
Epistemological values

Our Focus for Today

~~Questionable~~ Recommended Research Practices
(Some) Epistemological values
- Namely openness
- Error-correcting settings (see Mayo, 1996)

Defining “Open”

What does it mean to you?
To me, in this context, it means:
- Transparent
- No barriers to entry

Defining “Science”

Is a matter for another day…
I love philosophy of science but I’ll spare you that lecture…
For today it suffices to say that today Science is done in teams

Diving Deeper

Open Science

Increase transparency
Share materials/resources

Open Science - Practices

Preregistration
Data Sharing
Supplementary Materials
Preprints

Open Science - Tools

aspredicted.org
osf.io
Web hosting
Version control
SciOps? (git?, containers?, RMarkdown?, pandoc?)

Ok…Some Philosophy…

Epistemological Virtues

Reason/Logic
Valuing knowledge
Independence
Tolerance (see Santos, Hagá, & Garcia-Marques, in prep)

Issues

Fraud
Replication Crisis
Perverse Incentive Structures
Lack of Diversity
- Commonly discussed (e.g., ethnic, abilities, age, demographic)
- Less frequently discussed (e.g., political, religious, methodological)

Other Issues

Lack of good organizational policies and work ethic
Undocumented knowledge and procedures (makes onboarding hard)
Imposter syndrome
Loneliness

Ok…Some More Philosophy…

Epistemological Arguments

These issues can compromise the error-correction of scientific communites (see Mayo, 1996)
But we must also be careful not to compromise it with our intervention (Garcia-Marques)

PART II - Intermission

Introducing RUGGED

History I

Prof. Marcelo Camerlo suggestion of creating an R user group stuck with me…
When participating in a WriteOn group, coordinated by Sara Hagá at FP-UL, I though the model could be adapted to R

History II

After giving workshops on R we started grouping together on RUGGED
We’re on our second season now (see the episode guide)!

Platforms

Main website: rggd.gitlab.io
Mailing list: rugged_mailing_list@googlegroups.com
Discord server
GitLab Group

Onboarding

If you want to join us follow these simple steps.
You don’t have to speak Portuguese, nor be affiliated with any Portuguese institution
But we do meet on Lisbon/GMT0 working hours…

Back To Your Regularly Schedule Programming

PART III - Open Science in Practice

What Can Open Science Do for Science?

Not much…
- It cannot fix the perverse incentives (see Santos, Hagá, & Garcia-Marques, in prep)
But it can remove barriers if:
- done in good faith
- done right

Levels of Open Science

Not an evidence-based taxonomy, just a guide for discussion

Level -1

Scientific results are only published in (expensive) paid journals
Descriptions of methods and analysis are short
No (or very few) supplementary materials

No Judgment

There were good reasons to do so in the past:
- Journals were mostly published in print (space issues)
- The tooling was not mainstream

Level 0

Descriptions of methods and analysis are (mostly) enough to replicate
Open channels for communication
Papers are indexed and easier to search for

Level 1 - Actual Open Science

Supplementary materials:
- Data sharing
- Sharing analysis scripts
- Sharing experimental files and materials
Open Access
Preprints
Modern tooling: osf.io, R/python, etc…

Level 1 - Some Notes

Data sharing must be done with respect to participants’ rights:

Consent forms should mention anonymised data might be shared
Data must be anonymised before being shared (error on the safe side)
Triple check the data is anonymised before putting it online
You can make the data anonymisation script public

Level 1 - Some Notes

If you’re sharing your files, try and make sure the names for the columns, variables, files, etc…, are meaningful to others

# This is not clear to anyone but yourself...
lr <- lm(P ~ A, datset)

# This is better
regression <- lm(Performance ~ Age, dataset)

# Or just use comments to explain the meaning
# Linear regression (`lr`) for the effect of age (`A`) on Performance (`P`)
lr <- lm(P ~ A, dataset)

Level 1 - Some Notes

If you’re going to share your analysis code make it portable:

# This is not portable!
setwd("/home/your_user_name/folders/that/exist/only/on/your/machine/dataset.csv")

# It could be made portable by using a predefined project structure
setwd("./project/stats/analyzes/")

# Importing data through an IDE's interface (e.g., RStudio) is not portable
# If the dataset variable appears without being defined first that's not portable
dataset$Age <- as.numeric(dataset$Age)

Level 1 - Some Notes

If you’re going to share your code try and clean it up:

But it’s better to share non-clean code than to not share code at all

Level 1 - Some Notes

# You viewed the variable and that was important in writing the script
# Still, it serves no purpose in the analysis so delete it.
View(dataset$Age)

# Same goes for finding the classes, dimensions of variables, etc...
class(dataset$Age)
dim(datset)

Level 1 - Some Notes

# Avoid tons of package imports when you only use one
library(afex)
library(dplyr)
library(effectsize)
library(tidyr)
library(WRS2)

# If I only use `dplyr` in that script then I should only have:
library(dplyr)
# If you don't know which packages you use and for what
# you should try and find that out

Level 1 - Some Notes

regression <- lm(DV ~ IV, dataset)

# Same goes for print statements of neeedless information
print(class(dataset$IV)
print(regression)
# You probably only want this last print statement
print(summary(regression))

# Better yet, save that to a file rather than printing it to stdout
sink("../results/regression_summary.txt")
print(summary(regression))
sink()

Next Level - SciOps

Disclaimer

This might be crazy
No warranty: use at your risk
I had little time to research and cite similar projects:
- But I’ll try to link some references in the end

SciOps

Using cutting-edge computer science tools for Open Science

SciOps - Tooling

Revision control with git and modern code forges (e.g., gitlab.com, github.com)
Continuous Integration/Continuous Delivery or Deployment:
- Pipelines of jobs/tasks that run every time files are updated
- Automatic update and deployment of relevant artifacts/outputs
DevOps: Keeping track of dependencies and configurations with containers

SciOps - Workflow

TODO lists, notes, discussions are tracked per project
Changes and decisions are linked to discussions, TODOs and notes.
Collaboration can (but needs not) be done in the open
Files are licensed under Open Source licenses

SciOps - Potential

A new level of:

Transparency
Archival
Collaboration
Automation

SciOps - Today

gitlab.com/rggd/SciOps

gitlab.com/rggd/sciops_workshop

PART IV - Exercises

Exercise I

Level Up Your Open Science Game

Two Options

Quick (can be done now)
Requiring planning and commitment (can be planned now)

Quick

Quick: Examples

Create an account on osf.io
Create an account on gitlab.com
Join RUGGED

Quick: Examples

Contribute to a RUGGED project “help wanted” issues
Fork SciOps and change a file to see what happens

Quick: Examples

Start drafting a mock preregistration
Check your scripts for non-portable code
Clean up your scripts

Quick: Examples

Start drafting a longer “Procedure” or “Analysis” for your paper
Start taking notes of your decisions so far
Email your colleagues to start a community

Planning

Planning: How To

Write down ideas
Make a TODO list
Sort TODOs:
- Mark the ones that need to be done in order (blocking or blocked)
- Mark the ones that are actionable
Pick your next task
Choose a commitment device

(See Allen, 2015)

Planning: Examples

What do I need to do to:

Make all my materials public?
Use programming for data analysis
Use Open Source software for my research (analysis and experiment running)

PART V - Outro

References

Debrief

All feedback is welcomed!
You can comment on this issue
You can email me any feedback you have
We can stay and chat after the session

Thank You

Thank you for being here!