JSON array
dataSpec describes the criteria used to find data if searching a database (NiDB for example, since this pipeline is usually connected to a database). The dataSpec is a JSON array of the following variables. Search variables specify how to find data in a database, and Export variables specify how the data is exported.
Primary key Required
Variable | Type | Default | Description |
---|---|---|---|
AssociationType
string
[Search] study
, or subject
.
BehavioralDirectory
string
[Export] if BehFormat
writes data to a sub directory, the directory should be named this.
BehavioralDirectoryFormat
string
[Export] nobeh
, behroot
, behseries
, behseriesdir
DataFormat
string
[Export] native
, dicom
, nifti3d
, nifti4d
, analyze3d
, analyze4d
, bids
.
Enabled
bool
[Search] true
if the step is enabled, false
otherwise
Gzip
bool
[Export] true
if converted Nift data should be g-zipped, false
otherwise.
ImageType
string
[Search] Comma separated list of image types, often derived from the DICOM ImageType tag, (0008:0008).
DataLevel
string
[Search] nearestintime
, samestudy
. Where is the data coming from.
Location
string
[Export] Directory, relative to the analysisroot
, where this data item will be written.
Modality
string
[Search] Modality to search for.
NumberBOLDreps
string
[Search] If SeriesCriteria
is set to usecriteria
, then search based on this option.
NumberImagesCriteria
string
[Search]
Optional
bool
[Search] true
if this data step is option. false
if this step is required and the analysis will not run if the data step is not found.
Order
number
The numerical order of this data step.
PreserveSeries
bool
[Export] true
to preserve series numbers or false
to assign new ordinal numbers.
PrimaryProtocol
bool
[Search] true
if this data step determines the primary study, from which subsequent analyses are run.
Protocol
string
[Search] Comma separated list of protocol name(s).
SeriesCriteria
string
[Search] Criteria for which series are downloaded if more than one matches criteria: all
, first
, last
, largest
, smallest
, usecriteria
.
UsePhaseDirectory
bool
[Export] true
to write data to a sub directory based on the phase encoding direction.
UseSeriesDirectory
bool
[Export] true
to write each series to its own directory, false
to write data to the root export directory.
JSON array
Pipelines are the methods used to analyze data after it has been collected. In other words, the experiment provides the methods to collect the data and the pipelines provide the methods to analyze the data once it has been collected.
Files associated with this section are stored in the following directory. PipelineName
is the unique name of the pipeline.
/pipelines/<PipelineName>
Primary key Required Computed (squirrel writer/reader should handle these variables)
Variable | Type | Default | Description |
---|---|---|---|
ClusterType
string
Compute cluster engine (sge or slurm).
ClusterUser
string
Submit username.
ClusterQueue
string
Queue to submit jobs.
ClusterSubmitHost
string
Hostname to submit jobs.
CompleteFiles
JSON array
JSON array of complete files, with relative paths to analysisroot
.
CreateDate
datetime
Date the pipeline was created.
DataCopyMethod
string
How the data is copied to the analysis directory: cp
, softlink
, hardlink
.
DependencyDirectory
string
DependencyLevel
string
DependencyLinkType
string
Description
string
Longer pipeline description.
DirectoryStructure
string
Directory
string
Directory where the analyses for this pipeline will be stored. Leave blank to use the default location.
Group
string
ID or name of a group on which this pipeline will run
GroupType
string
Either subject or study
Level
number
subject-level analysis (1) or group-level analysis (2).
MaxWallTime
number
Maximum allowed clock (wall) time in minutes for the analysis to run.
ClusterMemory
number
Amount of memory in GB requested for a running job.
PipelineName
string
Pipeline name.
Notes
string
Extended notes about the pipeline
NumberConcurrentAnalyses
number
1
Number of analyses allowed to run at the same time. This number if managed by NiDB and is different than grid engine queue size.
ClusterNumberCores
number
1
Number of CPU cores requested for a running job.
ParentPipelines
string
Comma separated list of parent pipelines.
ResultScript
string
Executable script to be run at completion of the analysis to find and insert results back into NiDB.
SubmitDelay
number
Delay in hours, after the study datetime, to submit to the cluster. Allows time to upload behavioral data.
TempDirectory
string
The path to a temporary directory if it is used, on a compute node.
UseProfile
bool
true if using the profile option, false otherwise.
UseTempDirectory
bool
true if using a temporary directory, false otherwise.
Version
number
1
Version of the pipeline.
PrimaryScript
string
See details of pipeline scripts
SecondaryScript
string
See details of pipeline scripts.
DataStepCount
number
Number of data steps.
VirtualPath
string
Path of this pipeline within the squirrel package.
JSON array
Details about how pipeline scripts are formatted for squirrel and NiDB
Pipeline scripts are meant to run in bash
. They are traditionally formatted to run with a RHEL distribution such as CentOS or Rocky Linux. The scripts are bash compliant, but have some nuances that allow them to run more effectively under an NiDB pipeline setup.
The bash script is interpreted to run on a cluster. Some commands are added to your script to allow it to check in and give status to NiDB as it is running.
There is no need for a shebang line at the beginning (for example #!/bin/sh
) because this script is only interested in the commands being run.
Example script...
Before being submitted to the cluster, the script is passed through the NiDB interpreter, and the actual bash script will look like below. This script is running on subject S2907GCS
, study 8
, under the freesurferUnified6
pipeline. This script will then be submitted to the cluster.
... script is submitted to the cluster
How to interpret the altered script
Details for the grid engine are added at the beginning
This includes max wall time, output directories, run-as user, etc
Each command is changed to include logging and check-ins
nidb cluster -u pipelinecheckin
checks in to the database the current step. This is displayed on the Pipelines --> Analysis webpage
Each command is also echoed to the grid engine log file so you can check the log file for the status
The output of each command is echoed to a separate log file in the last line using the >>
There are a few pipeline variables that are interpreted by NiDB when running. The variable is replaced with the value before the final script is written out. Each study on which a pipeline runs will have a different script, with different paths, IDs, and other variables listed below.
Variable | Description |
---|---|
{NOLOG}
This does not append >>
to the end of a command to log the output
{NOCHECKIN}
This does not prepend a command with a check in, and does not echo the command being run. This is useful (necessary) when running multi-line commands like for loops and if/then statements
{PROFILE}
This prepends the command with a profiler to output information about CPU and memory usage.
{analysisrootdir}
The full path to the analysis root directory. ex /home/user/thePipeline/S1234ABC/1/
{subjectuid}
The UID of the subject being analyzed. Ex S1234ABC
{studynum}
The study number of the study being analyzed. ex 2
{uidstudynum}
UID and studynumber together. ex S1234ABC2
{pipelinename}
The pipeline name
{studydatetime}
The study datetime. ex 2022-07-04 12:34:56
{first_ext_file}
Replaces the variable with the first file (alphabetically) found with the ext
extension
{first_n_ext_files}
Replaces the variable with the first N
files (alphabetically) found with the ext
extension
{last_ext_file}
Replaces the variable with the last file (alphabetically) found with the ext
extension
{all_ext_files}
Replaces the variable with all files (alphabetically) found with the ext
extension
{command}
The command being run. ex ls -l
{workingdir}
The current working directory
{description}
The description of the command. This is anything following the #
, also called a comment
{analysisid}
The analysisID of the analysis. This is useful when inserting analysis results, as the analysisID is required to do that
{subjectuids}
[Second level analysis] List of subjectIDs
{studydatetimes}
[Second level analysis] List of studyDateTimes in the group
{analysisgroupid}
[Second level analysis] The analysisID
{uidstudynums}
[Second level analysis] List of UIDStudyNums
{numsubjects}
[Second level analysis] Total number of subjects in the group analysis
{groups}
[Second level analysis] List of group names contributing to the group analysis. Sometimes this can be used when comparing groups
{numsubjects_groupname}
[Second level analysis] Number of subjects within the specified groupname
{uidstudynums_groupname}
[Second level analysis] Number of studies within the specified groupname