After sending a DNA sample to a sequencing company, the company will return "short reads" of sequences that are contained within the genome. These short reads come with a quality score that describes the certainty of nucleotide at that location, and these short reads are analyzed by programs utilizing bioinformatics tools. Many commands ran to analyze short reads require extensive time that would be inefficient to wait until that command is completed to begin the next command. This is when a bash script can compile commands into an algorithm, so alignment scripts can be created to make the sequential commands run more efficiently.
Another useful feature of bash scripting allows for tools of various languages, such as Perl, Java and Python to be utilized within a single script. This allows a file to be created by one tool written in one language and modified by the next without the programmer being present. This is extremely useful because many tools take seconds to days to complete depending on the inputted file size, so the use of a script is very useful. For example, to begin analyzing short- reads, Trimmoatic is used to determine which reads can be considered “good reads”. The parameters of a good read, such as certain quality scores for a length of the is inputted into the program and the output will be used by other tools, such as SAM (Sequence Alignment/Map) Tools.
The use of variables can make scripting become more efficient since many experiments will require reads from numerous individuals in a population. This leads to the issue of having to change the file name in each line of the code, which is time consuming and unnecessary. It is unnecessary because the use of a variable allows the user to input a file name, such as ecoli shown in Figure 1, which can be called upon numerous times throughout the script as $ {File1}. Additionally, this program could be made even better by creating another script to change the file name to the next sample automatically after the previous sample has been completed, so the user can begin analysis and periodically check for errors while the program is running.
Another useful feature this program incorporates is the use of the echo command, so the computer will print to screen the command that it is running. While it may seem like an extra command, using the echo command allows for troubleshooting errors within the program to be done efficiently. If there was not the echo command, a user would have to error check the entire script for a single error in one command; however, with echo, the command where the error occurred will be printed to screen and the user will be able to identify which tool contains an error.
An example of a Bash Script with the echo command and variables being used is shown below:
Comments
Post a Comment