All pages
Powered by GitBook
1 of 3

Loading...

Loading...

Loading...

data-steps

JSON array

dataSpec describes the criteria used to find data if searching a database (NiDB for example, since this pipeline is usually connected to a database). The dataSpec is a JSON array of the following variables. Search variables specify how to find data in a database, and Export variables specify how the data is exported.

JSON variables

Variable
Type
Default
Description

AssociationType

string

[Search] study, or subject.

BehavioralDirectory

string

[Export] if BehFormat writes data to a sub directory, the directory should be named this.

BehavioralDirectoryFormat

string

[Export] nobeh, behroot, behseries, behseriesdir

DataFormat

string

[Export] native, dicom, nifti3d, nifti4d, analyze3d, analyze4d, bids.

Enabled

bool

[Search] true if the step is enabled, false otherwise

Gzip

bool

[Export] true if converted Nift data should be g-zipped, false otherwise.

ImageType

string

[Search] Comma separated list of image types, often derived from the DICOM ImageType tag, (0008:0008).

DataLevel

string

[Search] nearestintime, samestudy. Where is the data coming from.

Location

string

[Export] Directory, relative to the analysisroot, where this data item will be written.

Modality

string

[Search] Modality to search for.

NumberBOLDreps

string

[Search] If SeriesCriteria is set to usecriteria, then search based on this option.

NumberImagesCriteria

string

[Search]

Optional

bool

[Search] true if this data step is option. false if this step is required and the analysis will not run if the data step is not found.

Order

number

The numerical order of this data step.

PreserveSeries

bool

[Export] true to preserve series numbers or false to assign new ordinal numbers.

PrimaryProtocol

bool

[Search] true if this data step determines the primary study, from which subsequent analyses are run.

Protocol

string

[Search] Comma separated list of protocol name(s).

SeriesCriteria

string

[Search] Criteria for which series are downloaded if more than one matches criteria: all, first, last, largest, smallest, usecriteria.

UsePhaseDirectory

bool

[Export] true to write data to a sub directory based on the phase encoding direction.

UseSeriesDirectory

bool

[Export] true to write each series to its own directory, false to write data to the root export directory.

Primary key Required

🔵
🔴
🔴
🔴
🔴
🔴
🔴
🔴
🔴
🔴
🔴

pipelines

JSON array

Pipelines are the methods used to analyze data after it has been collected. In other words, the experiment provides the methods to collect the data and the pipelines provide the methods to analyze the data once it has been collected.

JSON Variables

Variable
Type
Default
Description

ClusterType

string

Compute cluster engine (sge or slurm).

ClusterUser

string

Submit username.

ClusterQueue

string

Queue to submit jobs.

ClusterSubmitHost

string

Hostname to submit jobs.

CompleteFiles

JSON array

JSON array of complete files, with relative paths to analysisroot.

CreateDate

datetime

Date the pipeline was created.

DataCopyMethod

string

How the data is copied to the analysis directory: cp, softlink, hardlink.

DependencyDirectory

string

DependencyLevel

string

DependencyLinkType

string

Description

string

Longer pipeline description.

DirectoryStructure

string

Directory

string

Directory where the analyses for this pipeline will be stored. Leave blank to use the default location.

Group

string

ID or name of a group on which this pipeline will run

GroupType

string

Either subject or study

Level

number

subject-level analysis (1) or group-level analysis (2).

MaxWallTime

number

Maximum allowed clock (wall) time in minutes for the analysis to run.

ClusterMemory

number

Amount of memory in GB requested for a running job.

PipelineName

string

Pipeline name.

Notes

string

Extended notes about the pipeline

NumberConcurrentAnalyses

number

1

Number of analyses allowed to run at the same time. This number if managed by NiDB and is different than grid engine queue size.

ClusterNumberCores

number

1

Number of CPU cores requested for a running job.

ParentPipelines

string

Comma separated list of parent pipelines.

ResultScript

string

Executable script to be run at completion of the analysis to find and insert results back into NiDB.

SubmitDelay

number

Delay in hours, after the study datetime, to submit to the cluster. Allows time to upload behavioral data.

TempDirectory

string

The path to a temporary directory if it is used, on a compute node.

UseProfile

bool

true if using the profile option, false otherwise.

UseTempDirectory

bool

true if using a temporary directory, false otherwise.

Version

number

1

Version of the pipeline.

PrimaryScript

string

SecondaryScript

string

DataStepCount

number

Number of data steps.

VirtualPath

string

Path of this pipeline within the squirrel package.

JSON array

Directory structure

Files associated with this section are stored in the following directory. PipelineName is the unique name of the pipeline.

/pipelines/<PipelineName>

Pipeline scripts

Details about how pipeline scripts are formatted for squirrel and NiDB

Pipeline scripts are meant to run in bash. They are traditionally formatted to run with a RHEL distribution such as CentOS or Rocky Linux. The scripts are bash compliant, but have some nuances that allow them to run more effectively under an NiDB pipeline setup.

The bash script is interpreted to run on a cluster. Some commands are added to your script to allow it to check in and give status to NiDB as it is running.

The script

Example script...

export FREESURFER_HOME=/opt/freesurfer-6.0     #  The Freesurfer home directory (version) you want to use
export FSFAST_HOME=/opt/freesurfer-6.0/fsfast     #  Not sure if these next two are needed but keep them just in case
export MNI_DIR=/opt/freesurfer-6.0/mni     #  Not sure if these next two are needed but keep them just in case
source $FREESURFER_HOME/SetUpFreeSurfer.sh     #  MGH's shell script that sets up Freesurfer to run
export SUBJECTS_DIR={analysisrootdir}     #  Point to the subject directory you plan to use - all FS data will go there
freesurfer > {analysisrootdir}/version.txt     # {NOLOG} get the freesurfer version
perl /opt/pipeline/ImportFreesurferData.pl {analysisrootdir}/data analysis     #  import data. the perl program allows importing of multiple T1s
recon-all -hippocampal-subfields-T1 -no-isrunning -all -notal-check -subjid analysis     #  Autorecon all {PROFILE}
if tail -n 1 {analysisrootdir}/analysis/scripts/recon-all-status.log | grep 'finished without error' ; then touch {analysisrootdir}/reconallsuccess.txt; fi     # {NOLOG} {NOCHECKIN}
recon-all -subjid analysis -qcache     #  do the qcache step {PROFILE}

Before being submitted to the cluster, the script is passed through the NiDB interpreter, and the actual bash script will look like below. This script is running on subject S2907GCS, study 8, under the freesurferUnified6 pipeline. This script will then be submitted to the cluster.

... script is submitted to the cluster

#!/bin/sh
#$ -N freesurferUnified6
#$ -S /bin/bash
#$ -j y
#$ -o /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6/pipeline/
#$ -V
#$ -u onrc
#$ -l h_rt=72:00:00
LD_LIBRARY_PATH=/opt/pipeline/nidb/; export LD_LIBRARY_PATH;
echo Hostname: `hostname`
echo Username: `whoami`

/opt/pipeline/nidb/nidb cluster -u pipelinecheckin -a 3151385 -s started -m 'Cluster processing started'
cd /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6;

/opt/pipeline/nidb/nidb cluster -u pipelinecheckin -a 3151385 -s processing -m 'processing step 1 of 10'
# The Freesurfer home directory (version) you want to use
echo Running export FREESURFER_HOME=/opt/freesurfer-6.0
export FREESURFER_HOME=/opt/freesurfer-6.0 >> /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6/pipeline/Step1

/opt/pipeline/nidb/nidb cluster -u pipelinecheckin -a 3151385 -s processing -m 'processing step 2 of 10'
# Not sure if these next two are needed but keep them just in case
echo Running export FSFAST_HOME=/opt/freesurfer-6.0/fsfast
export FSFAST_HOME=/opt/freesurfer-6.0/fsfast >> /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6/pipeline/Step2

/opt/pipeline/nidb/nidb cluster -u pipelinecheckin -a 3151385 -s processing -m 'processing step 3 of 10'
# Not sure if these next two are needed but keep them just in case
echo Running export MNI_DIR=/opt/freesurfer-6.0/mni
export MNI_DIR=/opt/freesurfer-6.0/mni >> /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6/pipeline/Step3

/opt/pipeline/nidb/nidb cluster -u pipelinecheckin -a 3151385 -s processing -m 'processing step 4 of 10'
# MGH's shell script that sets up Freesurfer to run
echo Running source $FREESURFER_HOME/SetUpFreeSurfer.sh
source $FREESURFER_HOME/SetUpFreeSurfer.sh >> /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6/pipeline/Step4

/opt/pipeline/nidb/nidb cluster -u pipelinecheckin -a 3151385 -s processing -m 'processing step 5 of 10'
# Point to the subject directory you plan to use - all FS data will go there
echo Running export SUBJECTS_DIR=/home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6
export SUBJECTS_DIR=/home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6 >> /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6/pipeline/Step5

/opt/pipeline/nidb/nidb cluster -u pipelinecheckin -a 3151385 -s processing -m 'processing step 6 of 10'
# get the freesurfer version
echo Running freesurfer > /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6/version.txt
freesurfer > /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6/version.txt

/opt/pipeline/nidb/nidb cluster -u pipelinecheckin -a 3151385 -s processing -m 'processing step 7 of 10'
# import data. the perl program allows importing of multiple T1s
echo Running perl /opt/pipeline/ImportFreesurferData.pl /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6/data analysis
perl /opt/pipeline/ImportFreesurferData.pl /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6/data analysis >> /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6/pipeline/Step7

/opt/pipeline/nidb/nidb cluster -u pipelinecheckin -a 3151385 -s processing -m 'processing step 8 of 10'
# Autorecon all {PROFILE}
echo Running recon-all -hippocampal-subfields-T1 -no-isrunning -all -notal-check -subjid analysis
/usr/bin/time -v recon-all -hippocampal-subfields-T1 -no-isrunning -all -notal-check -subjid analysis >> /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6/pipeline/Step8
if tail -n 1 /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6/analysis/scripts/recon-all-status.log | grep 'finished without error' ; then touch /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6/reconallsuccess.txt; fi

/opt/pipeline/nidb/nidb cluster -u pipelinecheckin -a 3151385 -s processing -m 'processing step 10 of 10'
# do the qcache step {PROFILE}
echo Running recon-all -subjid analysis -qcache
/usr/bin/time -v recon-all -subjid analysis -qcache >> /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6/pipeline/Step10

/opt/pipeline/nidb/nidb cluster -u pipelinecheckin -a 3151385 -s processing -m 'Processing result script'
# Running result script
echo Running perl /opt/pipeline/ParseFreesurferResults.pl -r /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6 -p /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6/analysis/stats -a 3151385     #  dump results back into ado2 > /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6/pipeline/stepResults.log 2>&1
perl /opt/pipeline/ParseFreesurferResults.pl -r /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6 -p /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6/analysis/stats -a 3151385     #  dump results back into ado2 > /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6/pipeline/stepResults.log 2>&1
chmod -Rf 777 /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6
/opt/pipeline/nidb/nidb cluster -u pipelinecheckin -a 3151385 -s processing -m 'Updating analysis files'
/opt/pipeline/nidb/nidb cluster -u updateanalysis -a 3151385
/opt/pipeline/nidb/nidb cluster -u pipelinecheckin -a 3151385 -s processing -m 'Checking for completed files'
/opt/pipeline/nidb/nidb cluster -u checkcompleteanalysis -a 3151385
/opt/pipeline/nidb/nidb cluster -u pipelinecheckin -a 3151385 -s complete -m 'Cluster processing complete'
chmod -Rf 777 /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6

How to interpret the altered script

  1. Details for the grid engine are added at the beginning

    • This includes max wall time, output directories, run-as user, etc

    • #!/bin/sh
      #$ -N freesurferUnified6
      #$ -S /bin/bash
      #$ -j y
      #$ -o /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6/pipeline/
      #$ -V
      #$ -u onrc
      #$ -l h_rt=72:00:00

  2. Each command is changed to include logging and check-ins

    • /opt/pipeline/nidb/nidb cluster -u pipelinecheckin -a 3151385 -s processing -m 'processing step 1 of 10'
      # The Freesurfer home directory (version) you want to use
      echo Running export FREESURFER_HOME=/opt/freesurfer-6.0
      export FREESURFER_HOME=/opt/freesurfer-6.0 >> /home/pipeline/onrc/data/pipeline/S2907GCS/8/freesurferUnified6/pipeline/Step1
    • nidb cluster -u pipelinecheckin checks in to the database the current step. This is displayed on the Pipelines --> Analysis webpage

    • Each command is also echoed to the grid engine log file so you can check the log file for the status

    • The output of each command is echoed to a separate log file in the last line using the >>

Pipeline Variables

There are a few pipeline variables that are interpreted by NiDB when running. The variable is replaced with the value before the final script is written out. Each study on which a pipeline runs will have a different script, with different paths, IDs, and other variables listed below.

Variable
Description

{NOLOG}

This does not append >> to the end of a command to log the output

{NOCHECKIN}

This does not prepend a command with a check in, and does not echo the command being run. This is useful (necessary) when running multi-line commands like for loops and if/then statements

{PROFILE}

This prepends the command with a profiler to output information about CPU and memory usage.

{analysisrootdir}

The full path to the analysis root directory. ex /home/user/thePipeline/S1234ABC/1/

{subjectuid}

The UID of the subject being analyzed. Ex S1234ABC

{studynum}

The study number of the study being analyzed. ex 2

{uidstudynum}

UID and studynumber together. ex S1234ABC2

{pipelinename}

The pipeline name

{studydatetime}

The study datetime. ex 2022-07-04 12:34:56

{first_ext_file}

Replaces the variable with the first file (alphabetically) found with the ext extension

{first_n_ext_files}

Replaces the variable with the first N files (alphabetically) found with the ext extension

{last_ext_file}

Replaces the variable with the last file (alphabetically) found with the ext extension

{all_ext_files}

Replaces the variable with all files (alphabetically) found with the ext extension

{command}

The command being run. ex ls -l

{workingdir}

The current working directory

{description}

The description of the command. This is anything following the #, also called a comment

{analysisid}

The analysisID of the analysis. This is useful when inserting analysis results, as the analysisID is required to do that

{subjectuids}

[Second level analysis] List of subjectIDs

{studydatetimes}

[Second level analysis] List of studyDateTimes in the group

{analysisgroupid}

[Second level analysis] The analysisID

{uidstudynums}

[Second level analysis] List of UIDStudyNums

{numsubjects}

[Second level analysis] Total number of subjects in the group analysis

{groups}

[Second level analysis] List of group names contributing to the group analysis. Sometimes this can be used when comparing groups

{numsubjects_groupname}

[Second level analysis] Number of subjects within the specified groupname

{uidstudynums_groupname}

[Second level analysis] Number of studies within the specified groupname

Primary key Required Computed (squirrel writer/reader should handle these variables)

See details of

See details of .

See

There is no need for a at the beginning (for example #!/bin/sh) because this script is only interested in the commands being run.

🔵
🔴
🟡
shebang line
🔴
🔴
🔴
🔵
🔴
🟡
🟡
pipeline scripts
pipeline scripts
data-steps
data specifications