JSON array
Pipelines are the methods used to analyze data after it has been collected. In other words, the experiment provides the methods to collect the data and the pipelines provide the methods to analyze the data once it has been collected.
Files associated with this section are stored in the following directory. PipelineName
is the unique name of the pipeline.
/pipelines/<PipelineName>
JSON array
dataSpec describes the criteria used to find data if searching a database (NiDB for example, since this pipeline is usually connected to a database). The dataSpec is a JSON array of the following variables. Search variables specify how to find data in a database, and Export variables specify how the data is exported.
Details about how pipeline scripts are formatted for squirrel and NiDB
Pipeline scripts are meant to run in bash
. They are traditionally formatted to run with a RHEL distribution such as CentOS or Rocky Linux. The scripts are bash compliant, but have some nuances that allow them to run more effectively under an NiDB pipeline setup.
The bash script is interpreted to run on a cluster. Some commands are added to your script to allow it to check in and give status to NiDB as it is running.
Example script...
Before being submitted to the cluster, the script is passed through the NiDB interpreter, and the actual bash script will look like below. This script is running on subject S2907GCS
, study 8
, under the freesurferUnified6
pipeline. This script will then be submitted to the cluster.
... script is submitted to the cluster
How to interpret the altered script
Details for the grid engine are added at the beginning
This includes max wall time, output directories, run-as user, etc
Each command is changed to include logging and check-ins
nidb cluster -u pipelinecheckin
checks in to the database the current step. This is displayed on the Pipelines --> Analysis webpage
Each command is also echoed to the grid engine log file so you can check the log file for the status
The output of each command is echoed to a separate log file in the last line using the >>
There are a few pipeline variables that are interpreted by NiDB when running. The variable is replaced with the value before the final script is written out. Each study on which a pipeline runs will have a different script, with different paths, IDs, and other variables listed below.
Primary key Required Computed (squirrel writer/reader should handle these variables)
Variable | Type | Default | Description |
---|---|---|---|
Primary key Required
Variable | Type | Default | Description |
---|
There is no need for a at the beginning (for example #!/bin/sh
) because this script is only interested in the commands being run.
Variable | Description |
---|
| This does not append |
| This does not prepend a command with a check in, and does not echo the command being run. This is useful (necessary) when running multi-line commands like for loops and if/then statements |
| This prepends the command with a profiler to output information about CPU and memory usage. |
| The full path to the analysis root directory. ex |
| The UID of the subject being analyzed. Ex |
| The study number of the study being analyzed. ex |
| UID and studynumber together. ex |
| The pipeline name |
| The study datetime. ex |
| Replaces the variable with the first file (alphabetically) found with the |
| Replaces the variable with the first |
| Replaces the variable with the last file (alphabetically) found with the |
| Replaces the variable with all files (alphabetically) found with the |
| The command being run. ex |
| The current working directory |
| The description of the command. This is anything following the |
| The analysisID of the analysis. This is useful when inserting analysis results, as the analysisID is required to do that |
| [Second level analysis] List of subjectIDs |
| [Second level analysis] List of studyDateTimes in the group |
| [Second level analysis] The analysisID |
| [Second level analysis] List of UIDStudyNums |
| [Second level analysis] Total number of subjects in the group analysis |
| [Second level analysis] List of group names contributing to the group analysis. Sometimes this can be used when comparing groups |
| [Second level analysis] Number of subjects within the specified |
| [Second level analysis] Number of studies within the specified |
ClusterType
string
Compute cluster engine (sge or slurm).
ClusterUser
string
Submit username.
ClusterQueue
string
Queue to submit jobs.
ClusterSubmitHost
string
Hostname to submit jobs.
CompleteFiles
JSON array
JSON array of complete files, with relative paths to analysisroot
.
CreateDate
datetime
Date the pipeline was created.
DataCopyMethod
string
How the data is copied to the analysis directory: cp
, softlink
, hardlink
.
DependencyDirectory
string
DependencyLevel
string
DependencyLinkType
string
Description
string
Longer pipeline description.
DirectoryStructure
string
Directory
string
Directory where the analyses for this pipeline will be stored. Leave blank to use the default location.
Group
string
ID or name of a group on which this pipeline will run
GroupType
string
Either subject or study
Level
number
subject-level analysis (1) or group-level analysis (2).
MaxWallTime
number
Maximum allowed clock (wall) time in minutes for the analysis to run.
ClusterMemory
number
Amount of memory in GB requested for a running job.
PipelineName
string
Pipeline name.
Notes
string
Extended notes about the pipeline
NumberConcurrentAnalyses
number
1
Number of analyses allowed to run at the same time. This number if managed by NiDB and is different than grid engine queue size.
ClusterNumberCores
number
1
Number of CPU cores requested for a running job.
ParentPipelines
string
Comma separated list of parent pipelines.
ResultScript
string
Executable script to be run at completion of the analysis to find and insert results back into NiDB.
SubmitDelay
number
Delay in hours, after the study datetime, to submit to the cluster. Allows time to upload behavioral data.
TempDirectory
string
The path to a temporary directory if it is used, on a compute node.
UseProfile
bool