TCS Innovation Labs - Hyderabad - Life SciencesLife Sciences research in Innovation Labs - Hyderabad (ILH) follows the Innovation Theme: “Enhance Healthcare”. The lab's focus areas include:
Development of a New Gene Finding Algorithm and its Application to Microbial GenomesA new algorithm called 4M (Mixed Memory Markov Model) has been developed by our group. The 4M algorithm is applied to the problem of finding genes in several prokaryotic genomes, and gives good results. The results are in general comparable with that of the widely used algorithm called Glimmer and in some aspects the performance of 4M is better than Glimmer. Eukaryotic gene prediction presents a different problem. They contain introns or non-coding regions within genes. Also, the presence of alternative splice sites, etc. complicates the problem further. We are working to extend the 4M algorithm to predict genes in eukaryotic genomes. This algorithm will then be applied to identify genes in malarial parasite, Plasmodium falciparum. Genome Annotation of The Malaria Parasite (Plasmodium falciparum)Over the last couple of years, work has been carried out in the area of improving the annotation of the genome of the malaria parasite, Plasmodium falciparum in our group. Our group is mainly focussed on the prediction of subcellular localization and function classification of P. falciparum proteins. We use different methods to come up with such cellular localization and function classification - these involve developing our own tools or using existing tools/studies. There are a number of subcellular locations like the apicoplast, rhoptries, micronemes, etc, as also extracellular locations like the RBC cytoplasm, to which P. falciparum proteins are known to localize to carry out their function. Since the apicoplast has been in focus in recent times for finding novel drug targets, we have been especially interested in classifying proteins that localize to this unique organelle. We have developed a prototype tool called TargetPf which incorporates existing rule-based and machine learning techniques for predicting protein localization to subcellular locations. A large number of genes are currently annotated as having "unknown function". Therefore, we focus on finding out putative functions for these genes. For example, we have been working closely with one of our collaborators for identifying certain proteins involved in the heme biosynthesis pathway. The heme biosynthetic pathway in P. falciparum has been analysed to identify the Glutamate pathway enzymes. A probable candidate protein has been identified, using bioinformatics approaches, for one of the two missing enzymes in P. falciparum, i.e., the Glutamate tRNA Reductase (GTR) enzyme. Search for two other proteins in the pathway, "Glutamate-1- Semialdehyde aminotransferase" and "Uroporphyrinogen III synthase", is currently in progress. The 3D structure prediction for the probable Glutamate tRNA Reductase (GTR) enzyme has been done using homology-based methods. Bio-Suite and MODELLER were used to build these models.
Identification of Compositionally Distinct Regions in GenomesThere has been rapid advances in sequencing technologies, aimed at increasing the throughput rate of sequencing various genomes. As a result of this, more than 400 bacterial organisms are fully sequenced and their sequences are available in the public domain databases. This number is likely to keep growing at an exponential pace for a considerable time. Given that the most bacterial organisms are of medical or industrial significance, it becomes highly important to understand their genome sequences in terms of the functions they code for. It is known that genomes of organisms is not uniform throughout with respect to their composition. It is also known that compositionally distinct regions in a genome usually have interesting 'messages' or rather hold special biological significance for the organism. Such regions could have been acquired from a different organism (popularly known as the horizontal gene transfer) or might have some specialized physiological roles to play in the organism (e.g., iron-uptake from the environment). We have developed an algorithm known as 'The Centroid Method' to identify compositionally distinct regions in any given genome for any word size. This method was applied on 50 bacterial genomes. We demonstrate: (i) identification of embedded sequence of foreign origin (i.e., from a different organism) and (ii) the ability to distinguish between genome and non-genome sequence inputs. In addition, we have also investigated the gene contents of all the compositionally distinct bins identified by our method. In general, these bins contained closely related genes coding for well-defined physiological roles. Particularly, phage-related and pathogenicity or genomic island components were frequently observed. Ribosomal proteins, which are known to have a distinct composition were also seen in these bins. The outcomes of our study reinforce the efficacy of the centroid method in identifying compositionally distinct regions in a given genome. Identification of Proteins Involved in the Cell Cycle of Entamoeba histolyticaEntamoeba histolytica is a parasite that proliferates in human intestine, and is known to be the cause of amoebic dysentery in humans. The sequencing of the genome was completed in 2005, and it was predicted to possess around 9900 genes . Of these, 49% were found to be hypothetical proteins.
Application of Genome Informatics to understand Bacterial Secretion SystemsBacterial pathogens often interact with their host by secreting virulent molecules across their cell envelop into the target cells. Their pathogenicity is critically dependent upon machineries, which mediate the transport and injection of toxic molecules into host cells. These machineries are specialized systems and are called the Secretion Systems. There are six such secretion systems, Type I through Type VI, which carry out the process of translocation. Our focus is to understand the mechanism of each of these 6 secretion systems in various bacteria using genome informatics. We have carried out a detailed analysis on the recently discovered Type VI Secretion System and also created a database and a web-based tool to identify and analyse various components of all the secretion systems. In silico ADME ModellingOver the last few years, research work has been carried out in the area of applying statistical methods for the prediction of absorption, distribution, metabolism and excretion (ADME) of drugs and drug-like compounds. This includes the application of new mathematical methods for the prediction ADME profiles for the first time in literature and the performance of the predictive models, thus generated are bench marked against other published approaches.
Investigation of the causes of late stage failures in drug development revealed that inappropriate ADMET [Absorption, Distribution, Metabolism, Excretion and Toxicity] properties were responsible for these failures. The later these failures are identified, the higher is the cost of development. Often, the liability affecting the compounds are not identified until a compound reaches the clinic, the most expensive phase of Pharma R&D.
Over the last few years, a number of in vitro and high-throughput methods have been developed to provide experimental evaluation of these properties as early and rapidly as possible.
However, it is important to note that for some assays, the time, throughput or compound requirements make it impossible to make the desired measurements on all desired compounds. Consequently, the interest on in silico estimates has increased dramatically, as compared to the in vitro or in vivo counterparts. In silico models are useful to rationalize a large number of experimental observations, offer potential for virtual screening applications and consequently can help in reducing time and cost of the drug discovery and development process. The potentials of in silico ADME models have created enormous interest among researchers from pharmaceutical industry and academia and thus it stands as an area of intense research. Bio-ApplianceBiology is in the midst of a transformation from an observation driven science to one that is data-driven. This has been made possible due to technological advances that have resulted in vast amounts of data being produced on a continual basis. An attendant problem accompanying such data production is the need for appropriate categorization and curation of the data. There is thus a need to provide this data to life scientists in academia and pharma/biotech industries in a form that is easily analysed and understood. The production of new and heterogeneous data also necessitates the refinement of existing analytical tools as well as development of newer algorithms. Currently, much of the data is available only from individual sites and not all of them have the same level of curation. Furthermore, in many cases the accompanying analysis has been performed with tools that are outdated. ‘Bio-Appliance’ has been developed to address these challenges and thereby provides great help to the researchers and academicians. The prime features of Bio-Appliance are:
In this manner, the end users can be assured of that they not only have the latest data but also up-to-date analysis of the data. Additionally, the same analytical software will also be available for interactive use either as database integrated functions and/or server-side applications. Bio-Suite TMTCS has implemented a comprehensive, portable and comprehensive software suite for Computational Biology and Bioinformatics. TCS in collaboration with Council of Scientific and Industrial Research (CSIR), Government of India, and several leading Indian academic institutes, undertook this distinctive project in 2002, under the New Millennium Indian Technology Leadership Initiative (NMITLI) program. This product, Bio-SuiteTM, which required 100 person years, was completed over 2 years by a team of 40 software engineers with backgrounds in computer science, engineering, life sciences, statistics and mathematics. Launched by former President of India, Dr. A.P.J. Abdul Kalam, the product won the NASSCOM Innovation award in 2005. It also won the Federation of Andhra Pradesh Confederation of Commerce and Industries (FAPCCI) award in the category of the Best New Product Innovation for the year 2004. In addition to this product, some of the computationally demanding algorithms have been implemented to run on clusters of Linux machines in the product Bio-Cluster. Signal Transduction databases and Natural Language Processing
|