Home | About | Manual | API | Download | Feedback | SF Homepage |

NGSView Manual and Tutorial

Quick Start

To run NGSView, go to the NGSView directory and type

bash$ bin/ngsview -d /path/to/project/directory -c config/ngsviewconf.xml

with correct path to project directory. Additional command line options:
--full, -f
Start in full screen mode

--bytes, -b NUM
Start with database cache size of NUM bytes (default 100 MB)

--giga, -g NUM
Start with database cache size of NUM gigabytes (default 0)


Overview

Concepts

NGSView is a sequence alignment editing tool developed for handling large numbers of sequences -- long or short. The main goal of NGSView is to provide more power to the user than other viewing tools allow. The user can move sequences around; cut, copy and paste them; run algoritms on them; choose between view modes; zoom in and out; et cetera. The user interface is a front end to a database, and changes made using NGSView are automatically reflected in the database.

The real power of NGSView comes into effect when custom tags are used. The user can add any tags (following the specified format) to the reads in the input, and then select sequences based on tag content. This makes it possible to very quickly create subsets of data, which can be worked on separately.

Directories

Each NGSView project needs its own project directory. This directory must be supplied on the command line with the option -d (or --dbhome-dir). The directory given must be empty or contain an existing project. See above for example how to start a NGSView session.

All contigs in one project are located in separate directories under the project directory. If a new contig is created, it will appear as a new directory in the project directory.

Supported file formats

The only format supported directly by NGSView is a native XML format. However, an all purpose perl script ("map2ngsview.pl") is included in the package, and using this script it is possible to import many types of data in column based format into the viewer (e.g. BED, BLAST, Eland, mapview processed MAQ, Corona). Run the script with the "--help" option for details.

For convenience, two additional scripts are included ("sam2col.pl" and "ace2col.pl") which convert SAM and ACE format files to the column based format, which then can be piped into the all purpose parser. Here are examples how to run them:

cat /path/to/sam_format_file.sam |./sam2col.pl|./map2ngsview.pl --name_col 1 \
  --seq_col 2 --qual_col 3 --chr_col 5 --strand_col 6 --start_col 7 \
  --end_col 8 --start_good_col 9 --end_good_col 10 --tag_col 11 \
  --out_dir /path/to/output_xml_dir/ --no_revcomp --ref_dir /path/to/ref_fasta_dir/


cat /path/to/ace_file.ace|./ace2col.pl |./map2ngsview.pl --name_col 1 --seq_col 2 \
  --chr_col 3 --strand_col 4 --start_col 5 --end_col 6 --start_good_col 7 \
  --end_good_col 8 --out_dir /path/to/output_xml_dir/ --no_revcomp


Views and windows

Multiple contigs can be open simultaneously, and multiple views of a contig can be open at the same time. This can be useful for e.g. viewing different parts of the same contigs at the same time, for instance at different zoom levels. Changes made in one view will be reflected in the other(s), since they represent the same document, i.e. contig. Open windows can be arranged within the workspace in overlapping or tiling fashion. The user can switch between windows using the Window menu, or by pressing the Tab key (Shift + Tab switches the windows in reverse order).

The user interacts with the data (= reads) mainly using the mouse. Clicking on a read selects it (indicated by a red instead of a black border). If the user presses Ctrl while clicking on a read, this toggles the "selectedness" of the read. It's also possible to select reads by pressing the left mouse button where it doesn't hit a read and dragging the mouse. A dashed "rubber band" appears, selecting all reads intersecting it upon release of the button. If Ctrl is pressed during this operation, previously selected reads remain selected. Right clicking on selected reads opens a context menu with options for manipulating data (see below).

Moving modes

In the normal moving mode (indicated by an arrow on the toolbar), selected reads can be moved vertically within the present contig. No horizontal movement, or copying reads to other contigs by dragging, is thus allowed in this mode. In drag mode, reads may be moved horizontally within a contig or dragged to other contigs, whereby they are copied to the new contig. In the new contig, the reads end up in the top left corner.

When moving reads within the same contig, the last move is undoable. This is done by selecting "Undo last move" from the Edit menu. NOTA BENE: if the user changes the read selection, or runs an Operation (see below), the last move is no longer undoable. This is to make sure that the integrity of the data is kept.

Final notes on moving: when one or several reads are moved, NGSView will refuse to let reads end up on negative rows or columns. If this is the case, the whole move will be cancelled. NGSView will however let reads up on the same positions, i.e. reads may overlap on the same row. The overlapping part is indicated by a green color.

Zooming

The user can zoom in and out of the views using toolbar buttons, menu options or keyboard shortcuts. It is possible to zoom in one direction only if desired. At a low zoom level ("birdseye view"), certain sequence features are not displayed as they would be unintelligable at this level anyway.

Sand box approach

By enabling editing operations such as cut, copy, and paste, as well as vertical and horizontal drag and drop, NGSView offers the possibility of taking reads from different regions, chromosomes, experiments or even species and view them together. We call this the sand box approach. As an example, consider a miRNA sequencing project where short RNA has been sequenced in a time course. The sand box approach enables the user to copy and paste reads from different loci into a new contig, and compare their expression patterns visually side by side rather than by scrolling back and forth, switching windows etc. Another example could be when there are one or more loci (with DNA, RNA, ChIP-seq or other data) which the user wants to share with others for inspection, without having to transfer gigabytes of data. It is then possible to copy select regions into one or more new contigs, export them as xml, and send to a collaborator in an email. As a third example, consider a case where a user has tagged the data before load time (e.g. based on average sequencing quality, or some user defined tag) and a while into analysis decides that some parts of the data (e.g. all reads from a specific Illumina lane) should be discarded. Discarding the reads then is then simply a matter of using the tag based find operation, and deleting those reads from the contig, rather than having to remap or manually go back into the mapping file, and then reload the data with all modifications reset. In combination with the API (see below), this flexibility allows for extensions to the system where few limits are imposed as to what is possible.

Context menu

By right clicking in a view, a context menu opens up. Four different options are available:

            The different Operations that have been implemented for NGSView so far. Currently these are:
           
            Info about reads and features at the clicked position. If a read, feature etc is not present at the position, no such info is displayed.

            Selection of different visualization modes for the data. Sequences can be visualized with quality as a grey scale behind or above the sequence, SNPs can be turned on or off etc.             Options for selecting all reads above, below, right and left of position.
            Scrolls the view so that the selected position is in the top left corner (if there are enough rows and columns to make this possible).
            Scrolls the view to specified position.


Menus

Contig

        Open
            Opens an existing contig or creates a new one (Ctrl + O)
        Close
            Closes current document and all its views (Ctrl + W)
        Exit
            Exits the application. Changes are automatically saved (Ctrl + Q)


Edit

        Undo last move
            Undoes the last move if, in the meantime, no Operations has been performed and read selection hasn't changed (Ctrl + Z)
        Cut
            Cuts selected reads. They are also copied to the clipboard and can be pasted back into the contig or into another contig (Ctrl + X)
        Copy
            Copies selected reads to clipboard (Ctrl + C)
        Paste
            Pastes clipboard content into current contig (Ctrl + V)
        Select All
            Selects all reads (Ctrl + A)
        Select Between Rows
            Selects all reads between rows specified by user

        Select Between Cols
            Selects all reads between cols specified by user

        Find read
            Selects all reads that matches specified name

        Find tag
            Selects all reads that contain a tag that matches specified name

View

        Different self-explanatory choices for zooming, along with some actions that need more explanation:

        Show statistics
            Displays database statistics from Berkley DB.
        Time course
            Draws time course as colors in reads. Blue = low, red = high. Normalized time course can also be drawn.

Window

        New view
            Opens up another view of current contig.
        Cascade
            Orders open windows in a cascading fashion.
        Tile
            Orders open windows in a tiling fashion.

Tools

        Import project
            Imports a dataset from an XML file
        Import Mate Pairs
            Imports mate pair data (see below)
        Import Time Course
            Imports time course data (see below)
        Normal Mode
            In this mode, read moving is only allowed in the vertical direction
        Drag Mode
            In this mode, it is possible to move reads horizontally within a contig, and copy reads to other contigs using dragging
        Export
            Exports contig to XML format.


Mate Pair data


Mate pair data should be in a file with one pair on each line, with the format

READNAME MATENAME MATELENGTH

Time Course data


Time course data should be in a file with entry on each line, with the format

READNAME TIMEVAL1 TIMEVAL2 ...

Native file format


NGSView uses a simple XML format, see file "format.txt" in the doc directory.

Safety, backups etc

Two issues are important to note about NGSView. Firstly, there is currently no "Undo" function implemented except for the last move. Edit operations made using NGSView are automatically reflected in the database, and changes have to be undone manually if the user changes his/her mind.

A way to get around this is to copy reads to new contigs and perform crucial editing in the new contigs. If the editing doesn't have the desired effect, the contigs can simply be discarded (by removing the contig directories AFTER the NGSView session). If the user want to keep the changes, the reads can be cut out of the old contig if that's what's desired. This is actually an intended use of NGSView - a "sand box" approach where the user can copy data, play around with it and then decide whether to keep it or not.

Secondly, NGSView is under developement, and no warranties are issued by the developers regarding program performance and stability. It's probably a good idea to make backups of the contig directory now and then to be on the safe side. NOTE: backup copying of the project directory must be performed between runs of the program, NOT during runs!

Extending NGSView

Using the API, it is possible to write extensions both for data types and, more importantly, operations that can be performed on the data in the viewer. NGSView is written in C++ and uses the "pluggable object factory template" design pattern for adding data types and operations. For convenience, here is skeleton code and instructions for adding an operation called "Dummy" to the system. At least three files are needed: 1) the header file for the operation (in this case "dummyalgo.h"), which defines a class "DummyAlgo" which inherits class "Algo" and must declare a "start()" function, 2) an implementation file ("dummyalgo.cc", the whole class CAN of course be implemented in the header as an alternative if desired), and 3) a maker file which registers the operation ("dummyalgomaker.cc"). Additional files can be added as needed.
After writing the code, the following is needed: 1) add the name of the header and .cc files to src.pro in the src/trapper directory, 2) rerun qmake and make. This will cause the new code to compile and be linked into the application, without having to recompile or rewrite any of the existing code.

File 1: dummyalgo.h
#ifndef TRAPPER_DUMMY_H
#define TRAPPER_DUMMY_H

#include "algo.h"

class DummyAlgo : public Algo
{
public:
DummyAlgo(TrapperDoc * pDoc_, std::set< db_recno_t >& recnoList, AlgoParam* param) : Algo(pDoc_, recnoList, param) {}
void start();

};

#endif// TRAPPER_DUMMY_H

File 2: dummyalgo.cc
#include "dummyalgo.h"
#include "readdata.h"
#include "generaldata.h"

using namespace std;

void DummyAlgo::start()
{
if ( selectedReads.empty() ) {
return;
}

//Loop through all selected reads
for (set< db_recno_t >::iterator it = selectedReads.begin(); it != selectedReads.end(); it++ ){

Database::PrimaryIterator* read_it = new Database::PrimaryIterator(pDoc, "ReadData");

int ret_read = read_it->setFromRecno(*it);
ReadData* r_test = (ret_read != DB_NOTFOUND) ? read_it->answer() : 0;
assert( r_test );

//Do stuff to each read through read_it->answer() pointer here, see API for details
/* Examples:

int rstart = read_it->answer()->startPos();
int rend = read_it->answer()->endPos();

*/

//Delete iterator to avoid deadlock after use
delete read_it;
}


File 3: dummyalgomaker.cc
#include "algomaker.h"
#include "dummyalgo.h"

char dummyAlgo[]="Dummy Operation";//The text that shows up in the popup
const bool dummyPopsup = true;//true if it should pop up in context menu
template
const AlgoMakerTP< DummyAlgo, dummyAlgo, dummyPopsup > AlgoMakerTP< DummyAlgo, dummyAlgo, dummyPopsup >::registerThis;


Wish list

This section contains a list of functionality that may be implemented in the future - developer help would be much appreciated!

Known bugs and problems

Please send bug reports, including instructions on how to invoke the bug, to (remove caps) erik.arnerNOSPAM@gmail.com.

Please send feature requests, and feedback regarding current functionality of the program, to (remove caps) erik.arnerNOSPAM@gmail.com.

Test data

Provided with this version is a file for testing NGSView functionality, and a utility program (map2ngsview.pl) for converting column based map files generated by alignment software into NGSView.

Test file:

Simple tutorial

Some pointers on how to get started with NGSView.

  1. Go to the NGSView installation directory.

  2. Make an empty directory called ngsviewtest that will be the new project directory.

  3. Open NGSView: bash$ bin/ngsview -d /path/to/ngsviewtest -c config/ngsviewconf.xml

  4. From the Tools menu, choose "Import project" and choose the provided xml file in the file dialog. This imports the data into the database, and will take several minutes (depends on disk speed).

  5. From the Contig menu, choose "Open" (or click the directory icon on the toolbar). A directory called "chr1" should now have appeared in your project directory. Choose it and click OK.

  6. You should now see an alignment consisting of black boxes on white background. They represent the reads in the contig. If you zoom in, you will see the base sequencesvisualized. Zooming can be performed using toolbar buttons or by choosing from the View menu, alternativly you can press the "+" and "-" keys on your keyboard. If you press Ctrl at the same time, the zooming is only performed in the X-direction. Similarly, pressing Ctrl + Alt zooms in Y-direction only. You can scroll using the mouse, or by using the arrow keys. Hitting Ctrl while using the arrow keys speeds the scrolling up.

  7. Try selecting and un-selecting some reads by Ctrl-clicking them. Also try the "rubber band" functionality for read selection by pressing the left mouse button where it doesn't hit a read and drag the mouse pointer. Select all the reads by either a) choosing "Select All" from the Edit menu, b) hitting Ctrl + A on the keyboard, or c) pressing the "Select All" button in the toolbar. If you click where no read is located, all reads are de-selected. Right click on a read and get familiar with the different options in the context menu that pops up (see above for more detailed description of this menu). Look around in the alignment by zooming in and out, scrolling etc.

  8. Try the cluster scrolling method, available as arrows on the toolbar. Pressing the arrows will scroll to the next region that satisfies the criteria in the input boxes (X number of reads in a Y base window). Try different settings, but be aware that unreasonable parameters (i.e. one million reads within a one base window) will cause the application to search through all reads in the entire contig, which may take time.

  9. Try the time course visualization mode, available as an option under the View menu, and also as a multi-colored button to the left of the cluster scrolling windows. Time course data (or any other type of multi-experiment data like case/control, different patients etc) are visualized in the reads themselves, as colored dots along the read. The dots correspond to expression values in the order they were entered in the input trapper xml file -- their relative location within the read does NOT indicate spatial location along the read. Blue color means lowly expressed, and the color progresses through yellow towards red for higher values. Color cutoffs are based on TPM (tags per million) typically observed in the Fantom 4 project. Also try to view the time course data in normalized mode -- this normalizes the expression data to sum up to one. Using the normalized mode, it is possible to visually compare reads with varying expression rates.


SNP Colors
A->T
Red
T->A
Dark Red
A->G
Green
G->A
Dark Green
A->C
Blue
C->A
Dark Blue
T->G
Cyan
G->T
Dark Cyan
T->C
Magenta
C->T
Dark Magenta
G->C
Yellow
C->G
Dark Yellow