The extraction of sequence data from PDB files by programming PERL

  • Posted:
  • Proposals: 0
  • Remote
  • #5135
  • Archived
  • 0

Description

Experience Level: Expert
This coursework concerns the extraction of sequence data from PDB files. To gain full marks you will have to write multiple subroutines, store them in a Perl module, and write multiple scripts that use one or more of these subroutines. However, substantial credit will be given for answers that provide just the subroutines, i.e. not within a Perl module. Credit will also be given for solutions that do not use subroutines.

Two sample PDB files can be copied to your current directory using the command cp /d/msc/u/ubcg60a/biocomp2/assessed3/*.ent . (here the final dot is important!). Note that file pdb1oai.ent contains a structure with only a single chain; code that handles this file but not the multi-chain structure in file pdb1tm.ent will gain considerable, but not full, credit.

The assignment is in several sub-sections, as follows:

Write a subroutine capable of converting 3-letter PDB codes to 1-letter PDB codes. Given a string containing a 3-letter code (e.g. "ALA"), the subroutine should return a string containing the corresponding 1-letter code (e.g. "A"). The most convenient mechanism for implementing this is using a hash. Make sure your subroutine can handle strings that are not valid 3-letter codes. Write a simple script that tests this subroutine.
Write a subroutine capable of extracting the full amino-acid sequences for all the chains in a PDB file. These sequences are found in the SEQRES section near the top of a PDB file. This subroutine should use the previous subroutine to convert the 3-letter codes in the SEQRES section into 1-letter codes. For full marks, the subroutine should return a list consisting of multiple strings, where each string consisting of the amino-acid sequence for a single chain. However, you will get significant credit if your subroutine can only handle PDB files containing a single chain (e.g. PDB 1OAI). Write a simple script that tests this subroutine.
Write a subroutine capable of extracting the structural sequence(s) from a PDB file. These sequences are found in the ATOM section further down a PDB file. For full marks, the subroutine should return a list consisting of multiple strings, each string consisting of the amino-acid sequence for a single chain. However, you will get significant credit if your subroutine can only handle PDB files containing a single chain (e.g. PDB 1OAI). Write a simple script that tests this subroutine.
Write a subroutine capable of comparing the length of two sequences. The subroutine should return 1 if the first sequence is longer than the second sequence, 0 if they are of equal length, and -1 if the second sequence is longer than the first sequence. Write a simple script that tests this subroutine.
Write a script that, using the subroutines above, reads in multiple PDB files and identifies which amino-acid chains from those files are of equal/unequal length. You do not have to worry about the chain identifiers (A, B, etc.), simply print out the answers in numbered order (1, 2, etc.) from the start of the file.
Note: If you write high-quality code for the above, you will gain a Distinction. However, you will gain additional marks if you successfully tackle some or all of the following. There is no need to attempt the following if you are under time pressures from other parts of the course. (a) Rewrite your code so that it handles chain identifiers as well as chain sequences. The preferred approach here would be to rewrite your subroutines so they return a reference to a hash. (b) Add a final subroutine that prints out "START" if two sequences differ only at the start, "END" if they differ only at the end, "BOTH" if they differ at both ends (but only at the ends), or "MIDDLE" if they differ somewhere in the middle of the sequences.

General Instructions-You may use any Internet or written resources you wish, but you must not get help from any individual and must not copy from each other. Plagiarism will be treated seriously. If you use or adapt code that you find on the Internet or in a book, it is vital that you fully acknowledge the source. You may subsequently be asked questioned about the code you have written, so it is vital that you fully understand the code you have written.

Note that credit may be given for partial answers (including incomplete code, or a written description of the approach you would adopt). Some marks will be reserved for the quality of the written code, including code layout, variable names and comments.

*it should be done around 2-3 days

New Proposal

Create an account now and send a proposal now to get this project.

Sign up

Clarification Board Ask a Question

    There are no clarification messages.