Retrieve Protein structure

How to Retrieve protein sequence data from the UniProt Database and structure from Protein Data Bank?

-step by step guide - Replico )

What is a biological database?

A biological database is a computerized archive used to store and organize biological data in such a way that information can be retrieved easily via a variety of search criteria.

Types of biological databases

Primary databases: contain original biological data. They are archives of raw sequence or structural data submitted by the scientific community. Example: Protein Data Bank.

Secondary databases: contain computationally processed or manually curated information, based on original information from primary databases. Example: UniProt, SCOP and CATH.

Composite Databases: Collection of multiple databases under a single umbrella. Example: NCBI.

In essence, the scientific community frequently develops or is involved in developing databases that cater to particular research interests. These are often called Specialized databases.


UniProt https://www.uniprot.org/

The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data.

Image

The UniProt database comprises of UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc).

The UniProt Knowledgebase consists of two sections called UniProtKB/Swiss-Prot and UniProtKB/TrEMBL. It has manually curated and non-redundant records. 

UniParc is designed to capture all publicly available protein sequence data. It is a publicly accessible non-redundant protein sequence database.

The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information.

protein sequence | uniprot

PROTEIN SEQUENCE

HOW TO RETRIEVE PROTEIN SEQUENCE FROM UniProt DATABASE?

Choose the search dataset through the dropdown (you can see in the image) as one of ‘UniProtKB’, ‘UniRef’ and ‘UniParc’. I am selecting ‘UniProtKB’

protein sequence | uniprot

protein sequence | uniprot

Enter your query protein in the search box. https://www.uniprot.org/uniprot/?query=insulin&sort=score

protein sequence | uniprot

You can see all the search results in the green box. You can filter the search results from the left side.
Click on the "Entry Id". It will redirect you to another page. 
protein sequence | uniprot
protein sequence | uniprot

This page has all the information regarding the sequence. At the top, there is a "Format" option.
protein sequence | uniprot

Select the format, you want. I am selecting "FASTA".
FASTA is a very popular format used by most software. FASTA format starts with a greater than (<) symbol. The first line contains relevant information about the sequence like Entry id or sequence identifier and a description of the sequence.
protein sequence | uniprot
You can copy all the text and save it on an MS word or google docs file. So that it can be later used for other purposes like sequence alignment or homology modelling.

PROTEIN STRUCTURE

HOW TO GET PROTEIN STRUCTURE FROM THE PROTEIN DATA BANK?

This resource is powered by the Protein Data Bank archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies that helps students and researchers understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease.
You can search "protein data bank" on google, or you can click on this link https://www.rcsb.org/  and It will redirect you to the PDB homepage.
protein structure | PDB
protein structure | PDB

There are two ways, you can search for the structure. Either directly on google and you can get a variety of options to choose from. The keyword I have used is "insulin structure PDB". Then you can select the entry of your interest.

protein structure | PDB

or you can go to PDB and enter your query in the search box and click on the search button. This is the standard way of doing it. The query I have used here is "Structure of human insulin"https://www.rcsb.org/structure/3E7Y
protein structure | PDB

Note:  you can also apply filters while searching for a query. All the filters are on the left side of the page. Filters will help you to get more precise results.

Click on the PDB entry of your interest. It will take you to another page. I am selecting the first entry with PDB code "3E7Y" as you can see in the image.

protein structure | PDB

You can click on the "structure" or You can select the "3D View" at the top. It will take you to another page, where you can visualize the structure more clearly. 
protein structure | PDB
protein structure | PDB

You can also click on the "Download option". And select the desired format, let's say "FASTA" or "PDB" and then download the structure file. 
protein structure | PDB

→ Then you can also open the downloaded structure file offline by using and structure visualization software.


Thank you for reading this article. For any help, you can put a comment below.

BY Anant Kumar (Replico)

REFERENCES

Comments

Post a Comment

Write your opinion about the above content.
You can also comment here, if you find any error in the data

Popular Posts