Retrieve Protein structure

July 13, 2021

Retrieve Protein structure

How to Retrieve protein sequence data from the UniProt Database and structure from Protein Data Bank?

( -step by step guide - Replico )

What is a biological database?

A biological database is a computerized archive used to store and organize biological data in such a way that information can be retrieved easily via a variety of search criteria.

Types of biological databases

Primary databases: contain original biological data. They are archives of raw sequence or structural data submitted by the scientific community. Example: Protein Data Bank.

Secondary databases: contain computationally processed or manually curated information, based on original information from primary databases. Example: UniProt, SCOP and CATH.

Composite Databases: Collection of multiple databases under a single umbrella. Example: NCBI.

In essence, the scientific community frequently develops or is involved in developing databases that cater to particular research interests. These are often called Specialized databases.

UniProt https://www.uniprot.org/

The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data.

The UniProt database comprises of UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), and the UniProt Archive (UniParc).

The UniProt Knowledgebase consists of two sections called UniProtKB/Swiss-Prot and UniProtKB/TrEMBL. It has manually curated and non-redundant records.

UniParc is designed to capture all publicly available protein sequence data. It is a publicly accessible non-redundant protein sequence database.

The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information.

PROTEIN SEQUENCE

HOW TO RETRIEVE PROTEIN SEQUENCE FROM UniProt DATABASE?

→ Choose the search dataset through the dropdown (you can see in the image) as one of ‘UniProtKB’, ‘UniRef’ and ‘UniParc’. I am selecting ‘UniProtKB’

→ Enter your query protein in the search box. https://www.uniprot.org/uniprot/?query=insulin&sort=score

You can see all the search results in the green box. You can filter the search results from the left side.

→ Click on the "Entry Id". It will redirect you to another page.

https://www.uniprot.org/uniprot/P08069

→ This page has all the information regarding the sequence. At the top, there is a "Format" option.

→ Select the format, you want. I am selecting "FASTA".

FASTA is a very popular format used by most software. FASTA format starts with a greater than (<) symbol. The first line contains relevant information about the sequence like Entry id or sequence identifier and a description of the sequence.

→ You can copy all the text and save it on an MS word or google docs file. So that it can be later used for other purposes like sequence alignment or homology modelling.

PROTEIN STRUCTURE

HOW TO GET PROTEIN STRUCTURE FROM THE PROTEIN DATA BANK?

This resource is powered by the Protein Data Bank archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies that helps students and researchers understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease.

→ You can search "protein data bank" on google, or you can click on this link https://www.rcsb.org/ and It will redirect you to the PDB homepage.

→ There are two ways, you can search for the structure. Either directly on google and you can get a variety of options to choose from. The keyword I have used is "insulin structure PDB". Then you can select the entry of your interest.

or you can go to PDB and enter your query in the search box and click on the search button. This is the standard way of doing it. The query I have used here is "Structure of human insulin". https://www.rcsb.org/structure/3E7Y

Note: you can also apply filters while searching for a query. All the filters are on the left side of the page. Filters will help you to get more precise results.

→ Click on the PDB entry of your interest. It will take you to another page. I am selecting the first entry with PDB code "3E7Y" as you can see in the image.

→ You can click on the "structure" or You can select the "3D View" at the top. It will take you to another page, where you can visualize the structure more clearly.

https://www.rcsb.org/3d-view/3E7Y/1