Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
JSON array
This object is an array of subjects, with information about each subject.
AlternateIDs
JSON array
List of alternate IDs. Comma separated.
DateOfBirth
date
Subject’s date of birth. Used to calculate age-at-study. Value can be YYYY-00-00
to store year only, or YYYY-MM-00
to store year and month only.
Gender
char
Gender.
GUID
string
Ethnicity1
string
NIH defined ethnicity: Usually hispanic
, non-hispanic
Ethnicity2
string
NIH defined race: americanindian
, asian
, black
, hispanic
, islander
, white
Sex
char
Sex at birth (F,M,O,U).
SubjectID
string
Unique ID of this subject. Each subject ID must be unique within the package.
InterventionCount
number
Number of intervention objects.
ObservationCount
number
Number of observation objects.
StudyCount
number
Number of studies.
VirtualPath
string
Relative path to the data within the package.
JSON array
Array of imaging studies/sessions.
JSON array
Array of observations.
JSON array
Array of interventions.
Files associated with this section are stored in the following directory
/data/<SubjectID>
Primary key Required Computed (squirrel writer/reader should handle these variables)
Globally unique identifier, from the NIMH Data Archive ().
JSON array
An array of imaging studies, with information about each study. An imaging study (or imaging session) is defined as a set of related series collected on a piece of equipment during a time period. An example is a research participant receiving an MRI exam. The participant goes into the scanner, has several MR images collected, and comes out. The time spent in the scanner and all of the data collected from it is considered to be a study.
Valid squirrel modalities are derived from the DICOM standard and from NiDB modalities. Modality can be any string, but some squirrel readers may not correctly interpret the modality or may convert it to “other” or “unknown”. See full list of modalities.
AgeAtStudy
number
Subject’s age in years at the time of the study.
Datetime
datetime
Date of the study.
DayNumber
number
For repeated studies and clinical trials, this indicates the day number of this study in relation to time 0.
Description
string
Study description.
Equipment
string
Equipment name, on which the imaging session was collected.
Height
number
Height in m of the subject at the time of the study.
Modality
string
StudyNumber
number
Study number. May be sequential or correspond to NiDB assigned study number.
StudyUID
string
DICOM field StudyUID
.
TimePoint
number
Similar to day number, but this should be an ordinal number.
VisitType
string
Type of visit. ex: Pre, Post.
Weight
number
Weight in kg of the subject at the time of the study.
AnalysisCount
number
Number of analyses for this study.
SeriesCount
number
Number of series for this study.
VirtualPath
string
Relative path to the data within the package.
JSON array
Array of series.
JSON array
Array of analyses.
Files associated with this section are stored in the following directory. SubjectID
and StudyNum
are the actual subject ID and study number, for example /data/S1234ABC/1
.
/data/<SubjectID>/<StudyNum>
JSON object
The package root contains all data and files for the package. The JSON root contains all JSON objects for the package.
JSON object
Package information.
JSON object
Raw and analyzed data.
JSON object
Methods used to analyze the data.
JSON object
Experimental methods used to collect the data.
JSON object
Data dictionary containing descriptions, mappings, and key/value information for any variables in the package.
NumPipelines
number
Number of pipelines.
NumExperiments
number
Number of experiments.
TotalFileCount
number
Total number of data files in the package, excluding .json files.
TotalSize
number
Total size, in bytes, of the data files.
Files associated with this object are stored in the following directory.
/
JSON array
An array of series. Basic series information is stored in the main squirrel.json
file. Extended information including series parameters such as DICOM tags are stored in a params.json
file in the series directory.
BidsEntity
string
BidsSuffix
string
BIDS suffix
BIDSTask
string
BIDS Task name
BIDSRun
number
BIDS run number
BIDSPhaseEncodingDirection
string
BIDS PE direction
Description
string
Description of the series
ExperimentName
string
Protocol
string
Protocol name
Run
number
The run identifies order of acquisition in cases of multiple identical series.
SeriesDatetime
date
Date of the series, usually taken from the DICOM header
SeriesNumber
number
Series number. May be sequential, correspond to NiDB assigned series number, or taken from DICOM header
SeriesUID
string
From the SeriesUID DICOM tag
BehavioralFileCount
number
Total number of beh files (including files in subdirs)
BehavioralSize
number
Size of beh data, in bytes
FileCount
number
Total number of files (including files in subdirs)
Size
number
Size of the data, in bytes
JSON file
data/subjectID/studyNum/seriesNum/params.json
JSON object
Files associated with this section are stored in the following directory. subjectID
, studyNum
, seriesNum
are the actual subject ID, study number, and series number. For example /data/S1234ABC/1/1
.
/data/<SubjectID>/<StudyNum>/<SeriesNum>
Behavioral data is stored in
/data/<SubjectID>/<StudyNum>/<SeriesNum>/beh
JSON object
This data object contains information about the subjects, and potential future data.
GroupAnalysisCount
number
Number of group analyses.
SubjectCount
number
Number of subjects in the package.
JSON array
Array containing the subjects.
JSON array
Array containing group analyses.
Files associated with this section are stored in the following directory, but actual binary data should be stored in the subjects or group-analysis sub directories.
/data
Separate JSON file - params.json
Series collection parameters are stored in a separate JSON file called params.json
stored in the series directory. The JSON object is an array of key-value pairs. This can be used for MRI sequence parameters.
All DICOM tags are acceptable parameters. See this list for available DICOM tags . Variable keys can be either the hexadecimal format (ID) or string format (Name). For example 0018:1030
or ProtocolName
. The params object contains any number of key/value pairs.
Files associated with this section are stored in the following directory. subjectID
, studyNum
, seriesNum
are the actual subject ID, study number, and series number. For example /data/S1234ABC/1/1
.
/data/<SubjectID>/<StudyNum>/<SeriesNum>/params.json
JSON array
Pipelines are the methods used to analyze data after it has been collected. In other words, the experiment provides the methods to collect the data and the pipelines provide the methods to analyze the data once it has been collected.
Files associated with this section are stored in the following directory. PipelineName
is the unique name of the pipeline.
/pipelines/<PipelineName>
JSON array
This object is an array of group analyses. A group analysis is considered an analysis involving more than one subject.
Files associated with this section are stored in the following directory, where <GroupAnalysisName> is the name of the analysis.
/group-analysis/<GroupAnalysisName>/
Primary key Required Computed (squirrel writer/reader should handle these variables)
Defines the type of data. See table of supported .
Computed (squirrel writer/reader should handles these variables)
Primary key Required Computed (squirrel writer/reader should handle these variables)
entity (anat, fmri, dwi, etc)
Experiment name associated with this series. Experiments link to the section of the squirrel package
Computed (squirrel writer/reader should handle these variables)
Primary key Required Computed (squirrel writer/reader should handle these variables)
Primary key Required Computed (squirrel writer/reader should handle these variables)
Primary key Required Computed (squirrel writer/reader should handle these variables)
{Key:Value}
A unique key, sometimes derived from the DICOM header
Protocol, T1w FieldStrength, 3.0
Details about how pipeline scripts are formatted for squirrel and NiDB
Pipeline scripts are meant to run in bash
. They are traditionally formatted to run with a RHEL distribution such as CentOS or Rocky Linux. The scripts are bash compliant, but have some nuances that allow them to run more effectively under an NiDB pipeline setup.
The bash script is interpreted to run on a cluster. Some commands are added to your script to allow it to check in and give status to NiDB as it is running.
There is no need for a shebang line at the beginning (for example #!/bin/sh
) because this script is only interested in the commands being run.
Example script...
Before being submitted to the cluster, the script is passed through the NiDB interpreter, and the actual bash script will look like below. This script is running on subject S2907GCS
, study 8
, under the freesurferUnified6
pipeline. This script will then be submitted to the cluster.
... script is submitted to the cluster
How to interpret the altered script
Details for the grid engine are added at the beginning
This includes max wall time, output directories, run-as user, etc
Each command is changed to include logging and check-ins
nidb cluster -u pipelinecheckin
checks in to the database the current step. This is displayed on the Pipelines --> Analysis webpage
Each command is also echoed to the grid engine log file so you can check the log file for the status
The output of each command is echoed to a separate log file in the last line using the >>
There are a few pipeline variables that are interpreted by NiDB when running. The variable is replaced with the value before the final script is written out. Each study on which a pipeline runs will have a different script, with different paths, IDs, and other variables listed below.
{NOLOG}
This does not append >>
to the end of a command to log the output
{NOCHECKIN}
This does not prepend a command with a check in, and does not echo the command being run. This is useful (necessary) when running multi-line commands like for loops and if/then statements
{PROFILE}
This prepends the command with a profiler to output information about CPU and memory usage.
{analysisrootdir}
The full path to the analysis root directory. ex /home/user/thePipeline/S1234ABC/1/
{subjectuid}
The UID of the subject being analyzed. Ex S1234ABC
{studynum}
The study number of the study being analyzed. ex 2
{uidstudynum}
UID and studynumber together. ex S1234ABC2
{pipelinename}
The pipeline name
{studydatetime}
The study datetime. ex 2022-07-04 12:34:56
{first_ext_file}
Replaces the variable with the first file (alphabetically) found with the ext
extension
{first_n_ext_files}
Replaces the variable with the first N
files (alphabetically) found with the ext
extension
{last_ext_file}
Replaces the variable with the last file (alphabetically) found with the ext
extension
{all_ext_files}
Replaces the variable with all files (alphabetically) found with the ext
extension
{command}
The command being run. ex ls -l
{workingdir}
The current working directory
{description}
The description of the command. This is anything following the #
, also called a comment
{analysisid}
The analysisID of the analysis. This is useful when inserting analysis results, as the analysisID is required to do that
{subjectuids}
[Second level analysis] List of subjectIDs
{studydatetimes}
[Second level analysis] List of studyDateTimes in the group
{analysisgroupid}
[Second level analysis] The analysisID
{uidstudynums}
[Second level analysis] List of UIDStudyNums
{numsubjects}
[Second level analysis] Total number of subjects in the group analysis
{groups}
[Second level analysis] List of group names contributing to the group analysis. Sometimes this can be used when comparing groups
{numsubjects_groupname}
[Second level analysis] Number of subjects within the specified groupname
{uidstudynums_groupname}
[Second level analysis] Number of studies within the specified groupname
Variable
Type
Default
Description
DateStart
date
Datetime of the start of the analysis.
DateEnd
date
Datetime of the end of the analysis.
DateClusterStart
date
Datetime the job began running on the cluster.
DateClusterEnd
date
Datetime the job finished running on the cluster.
Hostname
string
If run on a cluster, the hostname of the node on which the analysis run.
PipelineName
string
Name of the pipeline used to generate these results.
PipelineVersion
number
1
Version of the pipeline used.
RunTime
number
0
Elapsed wall time, in seconds, to run the analysis after setup.
SeriesCount
number
0
Number of series downloaded/used to perform analysis.
SetupTime
number
0
Elapsed wall time, in seconds, to copy data and set up analysis.
Status
string
Status, should always be ‘complete’.
StatusMessage
string
Last running status message.
Successful
bool
Analysis ran to completion without error and expected files were created.
Size
number
Size in bytes of the analysis.
VirtualPath
string
Relative path to the data within the package.
ClusterType
string
Compute cluster engine (sge or slurm).
ClusterUser
string
Submit username.
ClusterQueue
string
Queue to submit jobs.
ClusterSubmitHost
string
Hostname to submit jobs.
CompleteFiles
JSON array
JSON array of complete files, with relative paths to analysisroot
.
CreateDate
datetime
Date the pipeline was created.
DataCopyMethod
string
How the data is copied to the analysis directory: cp
, softlink
, hardlink
.
DependencyDirectory
string
DependencyLevel
string
DependencyLinkType
string
Description
string
Longer pipeline description.
DirectoryStructure
string
Directory
string
Directory where the analyses for this pipeline will be stored. Leave blank to use the default location.
Group
string
ID or name of a group on which this pipeline will run
GroupType
string
Either subject or study
Level
number
subject-level analysis (1) or group-level analysis (2).
MaxWallTime
number
Maximum allowed clock (wall) time in minutes for the analysis to run.
ClusterMemory
number
Amount of memory in GB requested for a running job.
PipelineName
string
Pipeline name.
Notes
string
Extended notes about the pipeline
NumberConcurrentAnalyses
number
1
Number of analyses allowed to run at the same time. This number if managed by NiDB and is different than grid engine queue size.
ClusterNumberCores
number
1
Number of CPU cores requested for a running job.
ParentPipelines
string
Comma separated list of parent pipelines.
ResultScript
string
Executable script to be run at completion of the analysis to find and insert results back into NiDB.
SubmitDelay
number
Delay in hours, after the study datetime, to submit to the cluster. Allows time to upload behavioral data.
TempDirectory
string
The path to a temporary directory if it is used, on a compute node.
UseProfile
bool
true if using the profile option, false otherwise.
UseTempDirectory
bool
true if using a temporary directory, false otherwise.
Version
number
1
Version of the pipeline.
PrimaryScript
string
See details of pipeline scripts
SecondaryScript
string
See details of pipeline scripts.
DataStepCount
number
Number of data steps.
VirtualPath
string
Path of this pipeline within the squirrel package.
JSON array
Datetime
datetime
Datetime of the group analysis.
Description
string
Description.
GroupAnalysisName
string
Name of this group analysis.
Notes
string
Notes about the group analysis.
FileCount
number
Number of files in the group analysis.
Size
number
Size in bytes of the analysis.
VirtualPath
string
Path to the group analysis data within the squirrel package.
Format specification for v1.0
A squirrel contains a JSON file with meta-data about all of the data in the package, and a directory structure to store files. While many data items are optional, a squirrel package must contain a JSON file and a data directory.
JSON File
JSON is JavaScript object notation, and many tutorials are available for how to read and write JSON files. Within the squirrel format, keys are camel-case; for example dayNumber or dateOfBirth, where each word in the key is capitalized except the first word. The JSON file should be manually editable. JSON resources:
JSON tutorial - https://www.w3schools.com/js/js_json_intro.asp
JSON specification - https://www.json.org/json-en.html
Data types
The JSON specification includes several data types, but squirrel uses some derivative data types: string, number, date, datetime, char. Date, datetime, and char are stored as the JSON string datatype and should be enclosed in double quotes.
Type
Notes
Example
string
Regular string
“My string of text”
number
Any JSON acceptable number
3.14159 or 1000000
datetime
Datetime is formatted as YYYY-MM-DD HH:MI:SS
where all numbers are zero-padded and use a 24-hour clock. Datetime is stored as a JSON string datatype
“2022-12-03 15:34:56”
date
Date is formatted as YYYY-MM-DD
“1990-01-05”
char
A single character
F
bool
true or false
true
JSON array
Item is a JSON array of any data type
JSON object
Item is a JSON object
Directory Structure
The JSON file squirrel.json
is stored in the root directory. A directory called data
contains any data described in the JSON file. Files can be of any type, with file any extension. Because of the broad range of environments in which squirrel files are used, filenames must only contain alphanumeric characters. Filenames cannot contain special characters or spaces and must be less than 255 characters in length.
Squirrel Package
A squirrel package becomes a package once the entire directory structure is combined into a zip file. The compression level does not matter, as long as the file is a .zip archive. Once created, this package can be distributed to other instances of NiDB, squirrel readers, or simply unzipped and manually extracted. Packages can be created manually or exported using NiDB or squirrel converters.
JSON array
Experiments describe how data was collected from the participant. In other words, the methods used to get the data. This does not describe how to analyze the data once it’s collected.
ExperimentName
string
Unique name of the experiment.
FileCount
number
Number of files contained in the experiment.
Size
number
Size, in bytes, of the experiment files.
VirtualPath
string
Path to the experiment within the squirrel package.
Files associated with this section are stored in the following directory. Where ExperimentName
is the unique name of the experiment.
/experiments/<ExperimentName>
JSON object
This object contains information about the squirrel package.
Changes
string
Any CHANGE files.
DataFormat
string
orig
Data format for imaging data to be written. Squirrel should attempt to convert to the specified format if possible. orig
, anon
, anonfull
, nifti3d
, nifti3dgz
, nifti4d
, nifti4dgz
(see details below).
Datetime
datetime
Datetime the package was created.
Description
string
Longer description of the package.
License
string
Any sharing or license notes, or LICENSE files.
NiDBVersion
string
The NiDB version which wrote the package.
Notes
JSON object
See details below.
PackageName
string
Short name of the package.
PackageFormat
string
squirrel
Always squirrel
.
Readme
string
Any README files.
SeriesDirectoryFormat
string
orig
orig
, seq
(see details below).
SquirrelVersion
string
Squirrel format version.
SquirrelBuild
string
Build version of the squirrel library and utilities.
StudyDirectoryFormat
string
orig
orig
, seq
(see details below).
SubjectDirectoryFormat
string
orig
orig
, seq
(see details below).
orig
- Original subject, study, series directory structure format. Example S1234ABC/1/1
seq
- Sequential. Zero-padded sequential numbers. Example 00001/0001/00001
orig
- Original, raw data format. If the original format was DICOM, the output format should be DICOM. See DICOM anonymization levels for details.
anon
- If original format is DICOM, write anonymized DICOM, removing most PHI, except dates. See DICOM anonymization levels for details.
anonfull
- If original format is DICOM, the files will be fully anonymized, by removing dates, times, locations in addition to PHI. See DICOM anonymization levels for details.
nifti3d
- Nifti 3D format
Example file001.nii
, file002.nii
, file003.nii
nifti3dgz
- gzipped Nifti 3D format
Example file001.nii.gz
, file002.nii.gz
, file003.nii.gz
nifti4d
- Nifti 4D format
Example file.nii
nifti4dgz
- gzipped Nifti 4D format
Example file.nii.gz
Notes about the package are stored here. This includes import and export logs, and notes from imported files. This is generally a freeform object, but notes can be divided into sections.
import
Any notes related to import. BIDS files such as README and CHANGES are stored here.
merge
Any notes related to the merging of datasets. Such as information about renumbering of subject IDs
export
Any notes related to the export process
Files associated with this section are stored in the following directory
/
JSON object
The data-dictionary object stores information describing mappings or any other descriptive information about the data. This can also contain any information that doesn't fit elsewhere in the squirrel package, such as project descriptions.
Examples include mapping numeric values (1,2,3,...) to descriptions (F, M, O, ...)
data-dictionary
DataDictionaryName
string
Name of this data dictionary.
NumFiles
number
Number of files contained in the experiment.
Size
number
Size, in bytes, of the experiment files.
VirtualPath
string
Path to the data-dictionary within the squirrel package.
data-dictionary-item
JSON array
Array of data dictionary items. See next table.
data-dictionary-item
VariableType
string
Type of variable.
VariableName
string
Name of the variable.
Description
string
Description of the variable.
KeyValueMapping
string
List of possible key/value mappings in the format key1=value1, key2=value2
. Example 1=Female, 2=Male
ExpectedTimepoints
number
Number of expected timepoints. Example, the study is expected to have 5 records of a variable.
RangeLow
number
For numeric values, the lower limit.
RangeHigh
number
For numeric values, the upper limit.
Files associated with this section are stored in the following directory.
/data-dictionary
JSON array
dataSpec describes the criteria used to find data if searching a database (NiDB for example, since this pipeline is usually connected to a database). The dataSpec is a JSON array of the following variables. Search variables specify how to find data in a database, and Export variables specify how the data is exported.
AssociationType
string
[Search] study
, or subject
.
BehavioralDirectory
string
[Export] if BehFormat
writes data to a sub directory, the directory should be named this.
BehavioralDirectoryFormat
string
[Export] nobeh
, behroot
, behseries
, behseriesdir
DataFormat
string
[Export] native
, dicom
, nifti3d
, nifti4d
, analyze3d
, analyze4d
, bids
.
Enabled
bool
[Search] true
if the step is enabled, false
otherwise
Gzip
bool
[Export] true
if converted Nift data should be g-zipped, false
otherwise.
ImageType
string
[Search] Comma separated list of image types, often derived from the DICOM ImageType tag, (0008:0008).
DataLevel
string
[Search] nearestintime
, samestudy
. Where is the data coming from.
Location
string
[Export] Directory, relative to the analysisroot
, where this data item will be written.
Modality
string
[Search] Modality to search for.
NumberBOLDreps
string
[Search] If SeriesCriteria
is set to usecriteria
, then search based on this option.
NumberImagesCriteria
string
[Search]
Optional
bool
[Search] true
if this data step is option. false
if this step is required and the analysis will not run if the data step is not found.
Order
number
The numerical order of this data step.
PreserveSeries
bool
[Export] true
to preserve series numbers or false
to assign new ordinal numbers.
PrimaryProtocol
bool
[Search] true
if this data step determines the primary study, from which subsequent analyses are run.
Protocol
string
[Search] Comma separated list of protocol name(s).
SeriesCriteria
string
[Search] Criteria for which series are downloaded if more than one matches criteria: all
, first
, last
, largest
, smallest
, usecriteria
.
UsePhaseDirectory
bool
[Export] true
to write data to a sub directory based on the phase encoding direction.
UseSeriesDirectory
bool
[Export] true
to write each series to its own directory, false
to write data to the root export directory.
JSON array
Interventions represent any substances or procedures administered to a participant; through a clinical trial or the participant’s use of prescription or recreational drugs. Detailed variables are available to record exactly how much and when a drug is administered. This allows searching by dose amount, or other variable.
AdministrationRoute
string
Drug entry route (oral, IV, unknown, etc).
DateRecordCreate
string
Date the record was created in the current database. The original record may have been imported from another database.
DateRecordEntry
string
Date the record was first entered into a database.
DateRecordModify
string
Date the record was modified in the current database.
DateEnd
datetime
Datetime the intervention was stopped.
DateStart
datetime
Datetime the intervention was started.
Description
string
Longer description.
DoseString
string
Full dosing string. Examples tylenol 325mg twice daily by mouth
, or 5g marijuana inhaled by volcano
DoseAmount
number
In combination with other dose variables, the quantity of the drug.
DoseFrequency
string
Description of the frequency of administration.
DoseKey
string
For clinical trials, the dose key.
DoseUnit
string
mg, g, ml, tablets, capsules, etc.
InterventionClass
string
Drug class.
InterventionName
string
Name of the intervention.
Notes
string
Notes about drug.
Rater
string
Rater/experimenter name.
The following examples convert between common language and the squirrel storage format
esomeprazole 20mg capsule by mouth daily
DrugClass
PPI
DrugName
esomeprazole
DoseAmount
20mg
DoseFrequency
daily
Route
oral
DoseUnit
mg
2 puffs atrovent inhaler every 6 hours
DrugName
ipratropium
DrugClass
bronchodilator
DoseAmount
2
DoseFrequency
every 6 hours
AdministrationRoute
inhaled
DoseUnit
puffs
JSON array
Observations are collected from a participant in response to an experiment.
DateEnd
datetime
End datetime of the observation.
DateRecordCreate
datetime
Date the record was created in the current database. The original record may have been imported from another database.
DateRecordEntry
datetime
Date the record was first entered into a database.
DateRecordModify
datetime
Date the record was modified in the current database.
DateStart
datetime
Start datetime of the observation.
Description
string
Longer description of the measure.
Duration
number
Duration of the measure in seconds, if known.
InstrumentName
string
Name of the instrument associated with this measure.
ObservationName
string
Name of the observation.
Notes
string
Detailed notes.
Rater
string
Name of the rater.
Value
string
Value (string or number).
Primary key Required Computed (squirrel writer/reader should handle these variables)
Primary key Required
Primary key Required Computed (squirrel writer/reader should handle these variables)
Primary key Required
Primary key Required
Primary key Required