Home | About | Manual | API | Download | Feedback | SF
Homepage |
NGSView Manual and Tutorial
Quick Start
To run NGSView, go to the NGSView directory and type
bash$ bin/ngsview -d /path/to/project/directory -c
config/ngsviewconf.xml
with correct path to project directory. Additional command line options:
--full, -f
Start in full screen mode
--bytes, -b NUM
Start with database cache size of NUM bytes (default 100 MB)
--giga, -g NUM
Start with database cache size of NUM gigabytes (default 0)
Overview
Concepts
NGSView is a
sequence alignment editing tool developed for handling large numbers
of sequences -- long or short. The main goal of NGSView is to provide
more power to the user than other viewing tools allow. The user can
move sequences around; cut, copy and paste them; run algoritms on
them; choose between view modes; zoom in and out; et cetera. The user
interface is a front end to a database, and changes made using NGSView
are automatically reflected in the database.
The real power
of NGSView comes into effect when custom tags are used. The user can
add any tags (following the specified format) to the reads in the
input, and then select sequences based on tag content. This makes it
possible to very quickly create subsets of data, which can be worked
on separately.
Directories
Each NGSView project needs its own project directory. This directory
must be supplied on the command line with the option -d (or
--dbhome-dir). The directory given must be empty or contain an
existing project. See above for example how to start a NGSView
session.
All contigs in one project are located in separate directories under
the project directory. If a new contig is created, it will appear as a
new directory in the project directory.
Supported file formats
The only format supported directly by NGSView is a native XML
format. However, an all purpose perl script ("map2ngsview.pl") is
included in the package, and using this script it is possible to
import many types of data in column based format into the viewer
(e.g. BED, BLAST, Eland, mapview processed MAQ, Corona). Run the
script with the "--help" option for details.
For convenience, two additional scripts are included ("sam2col.pl" and
"ace2col.pl") which convert SAM and ACE format files to the column
based format, which then can be piped into the all purpose
parser. Here are examples how to run them:
cat /path/to/sam_format_file.sam |./sam2col.pl|./map2ngsview.pl --name_col 1 \
--seq_col 2 --qual_col 3 --chr_col 5 --strand_col 6 --start_col 7 \
--end_col 8 --start_good_col 9 --end_good_col 10 --tag_col 11 \
--out_dir /path/to/output_xml_dir/ --no_revcomp --ref_dir /path/to/ref_fasta_dir/
cat /path/to/ace_file.ace|./ace2col.pl |./map2ngsview.pl --name_col 1 --seq_col 2 \
--chr_col 3 --strand_col 4 --start_col 5 --end_col 6 --start_good_col 7 \
--end_good_col 8 --out_dir /path/to/output_xml_dir/ --no_revcomp
Views and windows
Multiple contigs can be open simultaneously, and multiple views of a
contig can be open at the same time. This can be useful for e.g. viewing
different parts of the same contigs at the same time, for instance at
different zoom levels. Changes made in one view will be reflected in the
other(s), since they represent the same document, i.e. contig. Open
windows can be arranged within the workspace in overlapping or tiling
fashion. The user can switch between windows using the Window menu, or by pressing the Tab
key (Shift + Tab switches the windows in reverse order).
The user interacts with the data (= reads) mainly using the mouse.
Clicking on a read selects it (indicated by a red instead of a black
border). If the user presses Ctrl while clicking on a read, this toggles
the "selectedness" of the read. It's also possible to select reads by
pressing the left mouse button where it doesn't hit a read and dragging
the mouse. A dashed "rubber band" appears, selecting all reads
intersecting it upon release of the button. If Ctrl is pressed during
this operation, previously selected reads remain selected. Right
clicking on selected reads opens a context menu with options for
manipulating data (see below).
Moving modes
In the normal moving mode
(indicated by an arrow on the toolbar), selected reads can be moved
vertically within the present contig. No horizontal movement, or
copying reads to other contigs by dragging, is thus allowed in this
mode. In drag mode, reads may be moved horizontally within a contig or
dragged to other contigs, whereby they are copied to the new
contig. In the new contig, the reads end up in the top left
corner.
When moving
reads within the same contig, the last move is undoable. This is done
by selecting "Undo last move" from the Edit menu. NOTA BENE: if the
user changes the read selection, or runs an Operation (see below), the
last move is no longer undoable. This is to make sure that the
integrity of the data is kept.
Final notes on
moving: when one or several reads are moved, NGSView will refuse to
let reads end up on negative rows or columns. If this is the case, the
whole move will be cancelled. NGSView will however let reads up on the
same positions, i.e. reads may overlap on the same row. The
overlapping part is indicated by a green color.
Zooming
The user can zoom in and out of the views using toolbar buttons, menu
options or keyboard shortcuts. It is possible to zoom in one direction
only if desired. At a low zoom level ("birdseye view"), certain sequence
features are not displayed as they would be unintelligable at this level
anyway.
Sand box approach
By enabling editing operations such as cut, copy, and paste, as well
as vertical and horizontal drag and drop, NGSView offers the
possibility of taking reads from different regions, chromosomes,
experiments or even species and view them together. We call this the
sand box approach. As an example, consider a miRNA sequencing project
where short RNA has been sequenced in a time course. The sand box
approach enables the user to copy and paste reads from different loci
into a new contig, and compare their expression patterns visually side
by side rather than by scrolling back and forth, switching windows
etc. Another example could be when there are one or more loci (with
DNA, RNA, ChIP-seq or other data) which the user wants to share with
others for inspection, without having to transfer gigabytes of
data. It is then possible to copy select regions into one or more new
contigs, export them as xml, and send to a collaborator in an
email. As a third example, consider a case where a user has tagged the
data before load time (e.g. based on average sequencing quality, or
some user defined tag) and a while into analysis decides that some
parts of the data (e.g. all reads from a specific Illumina lane)
should be discarded. Discarding the reads then is then simply a matter
of using the tag based find operation, and deleting those reads from
the contig, rather than having to remap or manually go back into the
mapping file, and then reload the data with all modifications
reset. In combination with the API (see below), this flexibility
allows for extensions to the system where few limits are imposed as to
what is possible.
Context menu
By right clicking in a view, a context menu opens up. Four different
options are available:
The different
Operations that have been implemented for NGSView so far. Currently
these are:
- Find Mates - finds and
selects mate pairs of selected reads, if such data has been imported.
- Optimize Layout -
reorganizes the reads to occupy as little vertical space as possible.
- Select strand
- highlights reads of specified strand (normal or reverse).
-
Info
Info about
reads and features at the clicked position. If a read, feature etc is
not present at the position, no such info is displayed.
- Read info - read name,
strand, quality region, local position in read and list of read's SNPs.
- Feature info - base,
quality value and SNP ID, if any.
- General info - row and
column in alignment.
-
Switch to viewmode
Selection of different
visualization modes for the data. Sequences can be visualized with
quality as a grey scale behind or above the sequence, SNPs can be
turned on or off etc.
Options for
selecting all reads above, below, right and left of
position.
Scrolls the
view so that the selected position is in the top left corner (if there
are enough rows and columns to make this possible).
Scrolls the
view to specified position.
Menus
Contig
Open
Opens an
existing contig or creates a new one (Ctrl + O)
Close
Closes current
document and all its views (Ctrl + W)
Exit
Exits the
application. Changes are automatically saved (Ctrl + Q)
Edit
Undo last move
Undoes the last
move if, in the meantime, no Operations has been performed and read selection
hasn't changed (Ctrl + Z)
Cut
Cuts selected
reads. They are also copied to the clipboard and can be pasted back into
the contig or into another contig (Ctrl + X)
Copy
Copies
selected reads to clipboard (Ctrl + C)
Paste
Pastes
clipboard content into current contig (Ctrl + V)
Select All
Selects all
reads (Ctrl + A)
Select Between Rows
Selects all
reads between rows specified by user
Select Between Cols
Selects all
reads between cols specified by user
Find read
Selects all
reads that matches specified name
Find tag
Selects all
reads that contain a tag that matches specified name
View
Different self-explanatory
choices for zooming, along with some actions that need more explanation:
Show statistics
Displays
database statistics from Berkley DB.
Time course
Draws time
course as colors in reads. Blue = low, red = high. Normalized time
course can also be drawn.
Window
New view
Opens up
another view of current contig.
Cascade
Orders open
windows in a cascading fashion.
Tile
Orders open
windows in a tiling fashion.
Tools
Import project
Imports a
dataset from an XML file
Import Mate Pairs
Imports mate pair
data (see below)
Import Time Course
Imports time course
data (see below)
Normal Mode
In this mode,
read moving is only allowed in the vertical direction
Drag Mode
In this mode,
it is possible to move reads horizontally within a contig, and copy
reads to other contigs using dragging
Export
Exports
contig to XML format.
Mate Pair data
Mate pair data should be in a file with one pair on each line, with the format
READNAME MATENAME MATELENGTH
Time Course data
Time course data should be in a file with entry on each line, with the format
READNAME TIMEVAL1 TIMEVAL2 ...
Native file format
NGSView uses a simple XML format, see file "format.txt" in the doc directory.
Safety, backups etc
Two issues
are important to note about NGSView. Firstly, there is currently no
"Undo" function implemented except for the last move. Edit operations
made using NGSView are automatically reflected in the database, and
changes have to be undone manually if the user changes his/her mind.
A way to get around this is to copy reads to new contigs and perform
crucial editing in the new contigs. If the editing doesn't have the
desired effect, the contigs can simply be discarded (by removing the
contig directories AFTER the NGSView session). If the user want to
keep the changes, the reads can be cut out of the old contig if that's
what's desired. This is actually an intended use of NGSView - a "sand
box" approach where the user can copy data, play around with it and
then decide whether to keep it or not.
Secondly, NGSView is under developement, and no warranties are issued
by the developers regarding program performance and stability. It's
probably a good idea to make backups of the contig directory now and
then to be on the safe side. NOTE: backup copying of the project
directory must be performed between runs of the program, NOT during runs!
Extending NGSView
Using the API, it is possible to write extensions both for data types
and, more importantly, operations that can be performed on the data in
the viewer. NGSView is written in C++ and uses the "pluggable object
factory template" design pattern for adding data types and
operations. For convenience, here is skeleton code and instructions
for adding an operation called "Dummy" to the system. At least three
files are needed: 1) the header file for the operation (in this case
"dummyalgo.h"), which defines a class "DummyAlgo" which inherits class
"Algo" and must declare a "start()" function, 2) an implementation
file ("dummyalgo.cc", the whole class CAN of course be implemented in
the header as an alternative if desired), and 3) a maker file which
registers the operation ("dummyalgomaker.cc"). Additional files can be
added as needed.
After writing the code, the following is needed:
1) add the name of the header and .cc files to src.pro in the
src/trapper directory, 2) rerun qmake and make. This will cause the
new code to compile and be linked into the application, without having
to recompile or rewrite any of the existing code.
File 1: dummyalgo.h
#ifndef TRAPPER_DUMMY_H
#define TRAPPER_DUMMY_H
#include "algo.h"
class DummyAlgo : public Algo
{
public:
DummyAlgo(TrapperDoc * pDoc_, std::set< db_recno_t >& recnoList, AlgoParam* param) : Algo(pDoc_, recnoList, param) {}
void start();
};
#endif// TRAPPER_DUMMY_H
File 2: dummyalgo.cc
#include "dummyalgo.h"
#include "readdata.h"
#include "generaldata.h"
using namespace std;
void DummyAlgo::start()
{
if ( selectedReads.empty() ) {
return;
}
//Loop through all selected reads
for (set< db_recno_t >::iterator it = selectedReads.begin(); it != selectedReads.end(); it++ ){
Database::PrimaryIterator* read_it = new Database::PrimaryIterator(pDoc, "ReadData");
int ret_read = read_it->setFromRecno(*it);
ReadData* r_test = (ret_read != DB_NOTFOUND) ? read_it->answer() : 0;
assert( r_test );
//Do stuff to each read through read_it->answer() pointer here, see API for details
/* Examples:
int rstart = read_it->answer()->startPos();
int rend = read_it->answer()->endPos();
*/
//Delete iterator to avoid deadlock after use
delete read_it;
}
File 3: dummyalgomaker.cc
#include "algomaker.h"
#include "dummyalgo.h"
char dummyAlgo[]="Dummy Operation";//The text that shows up in the popup
const bool dummyPopsup = true;//true if it should pop up in context menu
template
const AlgoMakerTP< DummyAlgo, dummyAlgo, dummyPopsup > AlgoMakerTP< DummyAlgo, dummyAlgo, dummyPopsup >::registerThis;
Wish list
This section contains a list of functionality that may be implemented
in the future - developer help would be much appreciated!
- Improved tag search method, maybe redesign user defined tag scheme
- Implement general read clustering method based on feature data (e.g. time course data, SNP data, mate pair data etc)
- Enable cancelling of ongoing operations, and progress bars
- Move data import to separate thread to prevent app from hanging during import
- Port to Windows and Mac OS X
- Introduce "zoom to selection" feature
Known bugs and problems
- Some visual bugs when scrolling on chromosome wide zoom level.
Please send bug reports, including instructions on how to invoke the
bug, to (remove caps) erik.arnerNOSPAM@gmail.com.
Please send feature requests, and feedback regarding current
functionality of the program, to (remove caps) erik.arnerNOSPAM@gmail.com.
Test data
Provided with this version is a file for testing NGSView
functionality, and a utility program (map2ngsview.pl)
for converting column based map files generated by alignment software into NGSView.
Test file:
- fantom4_cage_chr1.xml -
XML file with all chr1 CAGE tags from FANTOM4, plus identified Level 3
promoters, reference sequence and genes.
Simple tutorial
Some pointers
on how to get started with NGSView.
- Go to the NGSView installation directory.
- Make an empty directory called ngsviewtest that will be the new
project directory.
- Open NGSView: bash$ bin/ngsview -d /path/to/ngsviewtest -c
config/ngsviewconf.xml
- From the Tools menu,
choose "Import project" and
choose the provided xml file in the file dialog. This imports the data
into the database, and will take several minutes (depends on disk
speed).
- From the Contig menu,
choose "Open" (or click the
directory icon on the toolbar). A directory called "chr1" should
now have appeared in your project directory. Choose it and click
OK.
- You should now see an alignment consisting of black boxes on
white background. They represent the reads in the contig. If you zoom
in, you will see the base sequencesvisualized.
Zooming can be performed using toolbar buttons or by choosing from the View menu,
alternativly you can press the "+" and "-" keys on your keyboard. If
you press Ctrl at the same time, the zooming is only performed in the
X-direction. Similarly, pressing Ctrl + Alt zooms in Y-direction only.
You can scroll using the mouse, or by using the arrow keys. Hitting Ctrl
while using the arrow keys speeds the scrolling up.
- Try selecting and un-selecting some reads by Ctrl-clicking them.
Also try the "rubber band" functionality for read selection by pressing
the left mouse button where it doesn't hit a read and drag the mouse
pointer. Select all the reads by either a) choosing "Select All" from
the Edit menu, b) hitting Ctrl +
A on the keyboard, or c) pressing the "Select All" button in the
toolbar. If you click where no read is located, all reads are
de-selected. Right click on a read and get familiar with the different
options in the context menu that pops up (see above for more detailed
description of this menu). Look around in the alignment by zooming in and out, scrolling
etc.
- Try the cluster scrolling method, available as arrows on the
toolbar. Pressing the arrows will scroll to the next region that
satisfies the criteria in the input boxes (X number of reads in a Y
base window). Try different settings, but be aware that unreasonable
parameters (i.e. one million reads within a one base window) will
cause the application to search through all reads in the entire
contig, which may take time.
- Try the time course visualization mode, available as an option
under the View menu, and also as a multi-colored button to the left
of the cluster scrolling windows. Time course data (or any other
type of multi-experiment data like case/control, different patients
etc) are visualized in the reads themselves, as colored dots along
the read. The dots correspond to expression values in the order they
were entered in the input trapper xml file -- their relative
location within the read does NOT indicate spatial location along
the read. Blue color means lowly expressed, and the color progresses
through yellow towards red for higher values. Color cutoffs are
based on TPM (tags per million) typically observed in the Fantom 4
project. Also try to view the time course data in normalized mode --
this normalizes the expression data to sum up to one. Using the
normalized mode, it is possible to visually compare reads with
varying expression rates.
SNP Colors
A->T
|
Red
|
T->A
|
Dark Red
|
A->G
|
Green
|
G->A
|
Dark Green
|
A->C
|
Blue
|
C->A
|
Dark Blue
|
T->G
|
Cyan
|
G->T
|
Dark Cyan
|
T->C
|
Magenta
|
C->T
|
Dark Magenta
|
G->C
|
Yellow
|
C->G
|
Dark Yellow
|