Some documentation

Documentation for jCowboy

www.jcowboy.org

1. Overview

jCowboy is a fully-automated server, which predicts the secondary and tertiary structure for an amino acid sequence (target) using different methods (local installed, web-tools, self-implemented).

The Javadoc can be found at http: //www.jcowboy.org/javadoc/

2. Serverstructure

(powerpoint slides)

2.1 Secondary structure prediction

jCowboy uses three tools for secondary structure to gain an optimal prediction: (How to choose the methods, please take a look at 4.3 Configuration)

2.1.1 Psipred (local)

2.1.2 PHD (Web)

2.1.3 JPred (Web)

Psipred gives jCowboy the opportunity to predict -independent from other servers- the secondary structure, which is used as an additional information for the template search (1.2.2) to increase the specificity.

2.2 Templatesearch

As well as in the secondary structure, jCowboy offers not only one but two possibilities to find an optimal template for the given target sequence. (How to choose the methods, please take a look at 4.3 Configuration)

2.2.1 Psi-Blast (local)

The Psi-Blast search is a useful tool to find a template from a given database for the target, but it uses no additional informations, just the multiple alignment.

2.2.2 Profile-Profile Alignment

The scoring scheme(log average scoring) was first used by Niklas von Oehsen (arby).

Our own implemented Profile-Profile alignment creates for a given database (in our case astral40) PSI-Blast profiles and alignes a target profile against each profile using the Gotoh-Algorithm. The alignment is augmented by the secondary structure information using a simple scoring scheme.

With profile informations, together with the secondary structure, jCowboy´s alignments are much more specific and sensitive.

The n best alignments (where n is the number of wished models to be created) are chosen by the alignment score.

2.3. Modelling

The target is now modelled to the n templates, which were found through in best pairwise alignments. Here jCowboy uses one local installed method to save the independency from other servers (4.5).

2.3.1 Modeller (local)

MODELLER is used for homology or comparative modeling of protein three-dimensional structures. Through the alignment the template and its structure is known and the target is modelled to the template-structure. MODELLER automatically calculates a model containing all non-hydrogen atoms.

In this step backbone and side chains of the target are modelled and a PDB File with atom coordinates is created.

2.4 Evaluation

To evaluate the produced models, jCowboy uses two methods as two scores can qualify a model better.

2.4.1 Prove (local)

This structure evaluation program evaluates the model by the volume of the single atoms. A negative score means a smaller volume than the average, positive score means a bigger volume.

2.4.2 Anolea (local)

Anolea calculates all atomic contacts in the non-local environment of every heavy atom in the protein. For every contact the predefined MFP (Mean Force Potential) is taken as the energy value. The energy profile of each amino acid is defined as the sum of the energy values of all its atoms, averaged over a window of size 5. If this value is larger than 0.0, this amino acid is declared a HEZ (High Energy Zone).

The anolea score is the energy z-score over the whole protein.

Further informations: home/proj/bioprakt/genprak6/software/Anolea/anolea_ISMB_1997.pdf

2.5 Benchmark

To show and evaluate the prediction made for the target, the benchmark has to be called. It gives the possibility to compare a real structure to the predicted model.

2.5.1 Superpose (local)

Superpose is a subtool of the modeller distribution. It creates an alignment of two pdb files (model, template) and calculates the rmsd of the two structures

Further information: http://salilab.org/modeller/manual/node100.html

2.5.2 CE (local)

CE is a protein structure alignment tool which calculates different scores, like the z-score, the rmsd and sequence identities. It uses a heuristic called incremental Combinatorial Extension of the optimal path.

Further information:

/home/proj/bioprakt/genprak6/software/ce_distr/README

3. Use of jCowboy via command line

3.1. Start jCowboy

On this way jCowboy starts and produces a prediction for a target by using one or several clients, depending on how many clients you wish to run. The target sequence is given to the server in the command line in one letter code.

As jCowboy works with a multi-threading system, four shells have to be opened.

3.1.1 Start the rmi-registry (Port 1110) for your client

Shell1: rmiregistry 1110

3.1.2 Change the directory to the one, in which you find jcowboy and start the server named ZalfRimmer

Shell2: cd <directory>

java jcowboy.server.ZalfRimmer

3.1.3 Change the directory as above and start now one or more client (for each)

Shell3: cd <directory>

java jcowboy.server.MethodClient <client, on which the server runs>

3.1.4 Change the directory and start now the job(the prediction)

Shell4: cd <directory>

java jcowboy.server.qPrediction <client of the server> <target sequence> file

3.2 Prediction-Results

3.2.1 Result-File

You find your results in the <directory> as “<jobID>.readme”, where jobID is the ID, you get back in Shell 4.

3.2.2 PDB-File

On Shell 2, where the server runs, you get the directory and the filename of the PDB-File, created by Modeller and the OutputController.

3.3 Benchmark

To see the results of the single methods and to compare the predicted tertiary structure of the target with the real structure, the benchmark has to be called.

Therefore a “session” has to be created: A session is a (random) subset of entries from the astral40, where the number of the sequences is given. These random sequences are stored in a file, named <sessionname>.fasta. For this session a prediction is made and evaluated.

3.3.1 Change the directory to the one, in which you find the benchmark and create a session

Shell1: cd <directory>

ruby benchmark.rb <sessionname> create <number of sequences>

3.3.2 Change the directory as above and start the server with the respective session

Shell2: cd <directory>

ruby benchmark.rb <sessionname> do <client of the server>

3.3.3 Change the directory as above and evaluate the predictions

Shell3: cd <directory>

ruby benchmark.rb <sessionname> evaluate

3.4 Benchmark-Results

A HTML-File with all the predictionresults is created. It is stored in /home/proj/bioprakt/genprak6/output/html/<sessionname>.html.It contains several information which are explain as follows:

3.4.1 Rasmol

All predicted models and the real template structure are available as rasmol-pictures(.gif)

3.4.2 RMSD

The RMSD is calculated by two tools: Superposer (2.5.1) and CE (2.5.2)

3.4.3 Prove- and Anolea-Score

Both tools evaluate the predicted model (2.4.1, 2.4.2)

3.4.4 Further scores and values

Alignmentscore, real and predicted scop-level

3.4.5 Gnuplots

The first plot shows the rmsd versus the relationship.

The second plot shows Anolea-score versus RMSD.

The third plot shows Prove-score versus the rmsd.

4. Explanation of the server concept, its programs and methods

4.1 MultiThreading

As a single prediction for a target sequence contains several steps and the usage of various programs and tools, time is the limiting factor. For example the template search by profile-profile alignment has to calculate for each target a new profile and has to compare it with every profile of all entries of the astral40 database (3500 entries) and create an alignment.

Also Modeller needs a lot of time to produce a PDB-File for the target. Especially those two steps spend a large amount of time. So, if the 10 best alignments should be modelled and the five best models are the result, 1 hour is probably the Minimum of this prediction for just one Target (with an average length of about 200 amino acids).

But the prediction server is expected to be used by several people and to predict many targets at the same time.

That were the reasons for the Multi-Threading.

4.2 Jobs

As parts of the prediction are done by different clients, every part has to be clearly defined. How the Jobs are passed, please take a look at the ExampleRun 4.4)
A Job contains the

· Method call (for example MPsipred.java), which calls then the Parser for the specific program or tool(ParsePsipred.rb) ,

· ControllerID, to bring back later the results to the right Controller,

· Parameters, which the Method or Parser needs.

4.3 Configuration

jCowboy is thought of a fully automated prediction server which runs with default configurations. But to see the difference in quality and/or results (or just for testing), it might be interesting to “switch” Methods “on” or “off”.

These Configurations and other parameters like i.e.the maximal iterations (Psi-Blast) are stored in the default.conf (configFile) and can easily be changed there.

4.4 Example Run

Let’s consider an example-prediction to show more clearly how our concept works.

We assume, that we just want one target to be predicted.

4.4.1 Configurations:

First the tools have to “switched off/on” in the ConfigFile (For details 4.3).

PHD: on

JPred:off

PsiPred:on

PsiBlast: off

Niksearch: on

Modeller:on

Prove: on

Anolea:off

4.4.2 Start

The target sequence is given to ZalfRimmer which calls the Rapper (one Rapper for one target). The Rapper calls the

Controllers:

CSecStruc(Secondary structure),

CTempSearch(TemplateSearch),

CHomModel(Homology Modelling),

CEval(Evaluation).

Now every Controller looks for his tool(s) to run in the ConfigFile. If the tool should be run, a job is created.

Jobs (for details 4.2):

phdJob,

psipredJob,

niksearchJobs: 25 alignments are one job, so NikSearch contains many jobs.

modellerJobs: every alignment, which is modelled, is one job

proveJob

Now every job is given to the MethodDistributor, which distributes every job to the MethodClients. The results are given back over the MethodDistributor to the respective Controller, which stores the results in the PredictionResult (every model has an own PredictionResult). For processing the results (from the PredictionResult) are catched from the next Job.

At the end, when all controllers have done their jobs, the OutputController is called and produces the output-pdb-file.

4.5 Independency of other servers/online tools

To ensure that every step of the prediction can be made, at least one method of every Controller is local installed, so problems with the web connection, moved URLs or disconnected servers have no impact on our server and its prediction.

4.6 Persistence

To avoid that through some interferences the server can complete its prediction, some measures have been implemented.

4.6.1 Store-File

After each Controller, the received results are stored in a “<jobID>.store” -File in the current directory. Is the rapper finished with the whole prediction, the *.store-File is deleted.

Is the Server (ZalfRimmer) now started, it first checks whether a *.store-File is in the current directory. Its existence is a obvious sign, that the previous prediction was imterfered. In this case the store-File is read and the results, which have been reached so far, are now reloaded. As the File contains additional the name of the controller, which has last been completed, the prediction is continued at the next controller after the stored controller.

If no such File exists, the server runs the normal way.

4.6.2 Client-Reconnect

If the server “through mysterious circumstances suddenly vanishes”, the clients normally run and cannot be used again in the case of the restart of the server. Through regular reconnecting (every 30 seconds) this case is avoided.

4.6.3 Job-Recovery

If a client i.e. is shut down, the server queues the job again, so the Job from this client is not lost and the prediction can be completed, otherwise this is endangered.

5. jCowboys

jCowboy was created during a practical training at the LMU (Ludwig-Maximilian-University of Munich) within the studies of Bioinformatics (November 2002 – February 2003). It will be continuously improved.

The team:

Carla Cantzler

Maximilian Karasz

Robert Körner

Thomas Obkircher

Andreas Spitzmüller

Simon Tietze