Kammlah Diane M
Full Text Available Abstract Background Cattle fever ticks, Rhipicephalus (Boophilus microplus and R. (B. annulatus, vector bovine and equine babesiosis, and have significantly expanded beyond the permanent quarantine zone established in South Texas. Currently, there are no vaccines approved for use within the United States for controlling these vectors. Vaccines developed in Australia and Cuba based on the midgut antigen Bm86 have variable efficacy against cattle fever ticks. A possible explanation for this variation in vaccine efficacy is amino acid sequence divergence between the recombinant Bm86 vaccine component and native Bm86 expressed in ticks from different geographical regions of the world. Results There was 91.8% amino acid sequence identity in Bm86 among R. microplus and R. annulatus sequenced from South Texas infestations. When South Texas isolates were compared to the Australian Yeerongpilly and Cuban Camcord vaccine strains, there was 89.8% and 90.0% identity, respectively. Most of the sequence divergence was focused in one region of the protein, amino acids 206-298. Hydrophilicity profiles revealed that two short regions of Bm86 (amino acids 206-210 and 560-570 appear to be more hydrophilic in South Texas isolates compared to vaccine strains. Only one amino acid difference was found between South Texas and vaccine strains within two previously described B-cell epitopes. A total of 4 amino acid differences were observed within three peptides previously shown to induce protective immune responses in cattle. Conclusions Sequence differences between South Texas isolates and Yeerongpilly and Camcord strains are spread throughout the entire Bm86 sequence, suggesting that geographic variation does exist. Differences within previously described B-cell epitopes between South Texas isolates and vaccine strains are minimal; however, short regions of hydrophilic amino acids found unique to South Texas isolates suggest that additional unique surface exposed
Rodrigo Casquero Cunha
Full Text Available The cattle tick Rhipicephalus (Boophilus microplus is responsible for great economic losses. It is mainly controlled chemically, with limitations regarding development of resistance to the chemicals. Vaccines may help control this parasite, thereby reducing tick pesticide use. In this light, we performed subcloning of the gene of the protein Bm86-GC, the homologue protein that currently forms the basis of vaccines (GavacTM and TickGardPLUS that have been developed against cattle ticks. The subcloning was done in the pPIC9 expression vector, for transformation in the yeast Pichia pastoris. This protein was characterized by expression of the recombinant Mut+ strain, which expressed greater quantities of protein. The expressed protein (rBm86-CG was recognized in the Western-blot assay using anti-Gavac, anti-TickGard, anti-larval extract and anti-rBm86-CG polyclonal sera. The serum produced in cattle vaccinated with the antigen CG rBm86 presented high antibody titers and recognized the native protein. The rBm86-GC has potential relevance as an immunogen for vaccine formulation against cattle ticks.O carrapato-do-boi Rhipicephalus (Boophilus microplus é responsável por grandes perdas econômicas. Seu controle é principalmente químico e apresenta limitações quanto ao desenvolvimento de resistência aos princípios ativos. As vacinas podem auxiliar no controle deste parasita diminuindo as aplicações de carrapaticidas. Considerando isso, foi realizada a subclonagem do gene da proteína Bm86-CG, proteína homologa a que atualmente é a base das vacinas desenvolvidas (GavacTM e TickGardPLUS contra o carrapato-do-boi, no vetor de expressão pPIC9, para ser transformado em levedura, Pichia pastoris. Esta proteína foi caracterizada pela expressão da cepa recombinante Mut+ que expressou maior quantidade de proteína. A proteína expressa, rBm86-CG, foi reconhecida no ensaio de Western-blot pelos soros policlonais anti-Gavac, anti-TickGard, anti
Full Text Available protein Mastigocoleus testarum MLEQIELKPNWERNQVAFLDFIVNGTSLHDQFDHPQVRDLCTVFTSDQYEFDGKSSAAIHASWFLGYGETPFPDDRIPVYICSSGDFDCGTVTAYLTVNDGTIKWSEFRIERLTEELQDQPIELTSVKQCVFERNAYEKLFQPFLRKVID
Full Text Available thetical protein Synechococcus sp. WH 7803 MSRQRFRGLYLQNTGHPLCFSFVTYTPQTREQMVACGDLRADEEYFSPVLFDFLLFVSEGILGASPGVAFPFGYDDLAIVASRIRGTGVQHEYLIAINASAWNESKQAVLQQLRDILSRDLWDGARLRRGNDHPSPSE
Full Text Available hetical protein Rivularia sp. PCC 7116 MAEDNNLTNNSATNISSESQTLNKDIEELVTRQAKAWENADSEAIIADFAENGAFIAPGTSLKGKADIKKAAEDYFKEFTDTKVKITRIFSDGKEGGVEWTWSDKNKKTGEKSLIDDAIIFEIKDGKIIYWREYFDKQTVSS
Full Text Available predicted protein Chlamydomonas reinhardtii MSSRPKRAASANMANVIAAEKANKAAALHAWPKMWATKLEAQLQLMFMPTRLHRRPLHQGTCRNYSTAPGITGVIELTSAFYRMYPNATFVFNKETAAKGTYRGEEETAASWWLKHVGSKLEIYLSPLRCRPEVSR ...
Full Text Available ical protein Prochlorothrix hollandica MYENERDNERENEYDLISPVEILPVIVARAIAPPSPPATTPDDPERVYESENEREDESISPVEILPVIVARAIA...PPSPPSTAPDDPEDEYERGDEREDEYEDEAISPVEILPVIVARAIAPPSPPATAPDEDAAAPDENEDEYEEI
Full Text Available pothetical protein Fischerella sp. JSC-11 MHYYVHPFQLELHKLENMIVHVQHVNNQEVKQIADSRLFTSQAIGEEGGDTVTTKAIGEEGGDTVTTQAIGEEGGDTVTTKAIGEEGGDTVTTQAIGEEGGDTVTTQAIGEEGGDTVTTKAIGEEGGDTVTTLAFGEEGGF
Full Text Available ... hypothetical protein Calothrix sp. PCC 7103 MDYVHPFQMELHKLESMIVHVQYADIKEVDKTLASNDAVSTQAVGEEGGTKVSTRALGEEGGNILTTYAVGEEGGNILTTYAVGEEGGDKVTTQAVGEEGGTRVTTYAVGEEGGGRVTTKAVGEEGGSIIRR
Full Text Available hetical protein Microcystis aeruginosa PCC 9806 MMEDIVWKMQQRSRTLQDYRKDIRGLWQDEAAKTLNRRYLDPHEDDDQKMIEFLQKQVQGLEKTNEELVKAKDYALEAERYSQQVEHFLEREKQEVKQAYYSYDRSIEYYGLTQAELPNIHRLIQQANRSCN ...
Full Text Available hypothetical protein Anabaena sp. PCC 7108 MTVRFLLDSNIISEPSRPIPNIQVLDQLNRYRSEVAIASVVVHEILYGCWRLPPSKRKDSLWKYIQDSVLNLPVFDYNLNAAKWHAQERARLSKIGKTPAFIDGQIASIAFCNDLILVTNNVADFQDFQDLVIENWFI
Full Text Available unnamed protein product, partial Ostreococcus tauri MRSFVLIIHASASYDKIRSCTPATRYACDVRSNLKRAALGDVQPPLGLVLAALEIIFVPRADDARVTHGLFEQPIEEALLLPGLRARYSSRQSKSHVTSHDPRLDPPQIHHPAPVRYHPIASPSX ...
Full Text Available hypothetical protein Microcoleus vaginatus MSEIPAEQTQTNLTTPEITTESSISGVENVKNSLGNVLNSWKLKVGVAVVVLFAVSLFAFYWQHIIAVVGMKSWSARSGANPIECMVRDTNNDQYVSCSALLDQQIVPLECSSSLFNIGCRVNYGTAAANPRQTNPR
Full Text Available 436017:4420 predicted protein Ostreococcus lucimarinus CCE9901 MSTRRPTTRARADDGFARDDDEDDGAHDDVAANTIVVYTKPGCCLCDGLKDKLDAAVDAAARAPPGASL...ECLRDFALCVRDVSTNAAWAESYAGSVPRVFVRVAVDAASTERSSVVSREFARPPPKRAAARVAEDLASLVRRACAPARAGWTVVTTTAWDAPSSSF ...
Full Text Available ckel incorporation protein HypA Oscillatoria acuminata PCC 6304 MHEVSLMENTLNIALDCASAQNASKIHRLKMRVGDLSGVVPDALEFAFDVVTRGTIAEGAKFEIERVPVVCHCSTCDRNFEPIDLFYECPHCHQLTYQIQSGQEIELTSLEVS ...
Full Text Available protein of unknown function DUF1818 Gloeocapsa sp. PCC 7428 MERVVKSGTGWRVGWNPHAAKYQALIGTDDWAIELTSAEFYDFCRLAVQLTEAIAQISQELMDEEKISCEAESDLVWMEVTGYPHAYSLHFILHTGRGVEGTWTPQAVPHLIQAVQMIQVF
Full Text Available 107:2633 ... hypothetical protein Cylst_3595 Cylindrospermum stagnale PCC 7417 MNKNNIQNYRFVCTLTFGDIYGQIIVWLITITISLASALALMGARRPVYALVTVGLVVLLTLPFLLFAFVTTLINHIELTSIEPGTKMEPIPGNVSQQQPIQASS
Full Text Available 551115:2260 ... 50S ribosomal protein L20 'Nostoc azollae' 0708 MTRVKRGNVARKRRNKILKLAKGFRGSHSTLFRTAHQQVMKALRSAYRDRKKKKRDFRRLWITRINAASRQNGLSYSQLIGNLKKANVELNRKMLAQLAVLDPASFAKVAELANSVKA
Full Text Available 05 696747:1505 ... hypothetical protein Arthrospira platensis NIES-39 MGGGNGTQQHHLSVMLGFTLVSPNLPRAIAYLSRLGGGNGTQQHHLSVMLGFTLVSPKLPRAIA...YLSRLGGGNGTQQHHLSVMLGFTLVSPNLPRAIAYLSRLGGGNGTQQHHLSVMLGFTLVSPNLQMLGETRATKSDRLFK
Full Text Available 05 696747:1505 ... hypothetical protein Arthrospira platensis NIES-39 MGGGNGTQQHHLSVMLGFTLVSPNLPRAIAYLSRLGGGNGTQQHHLSVMLGFTLVSPKLPRAIA...YLSRLGGGNGTQQHHLSVMLGFTLVSPNLPRAIAYLSRLGGGNGTQQHHLSVMLGFTLVSPNLQMLGETRATKSDRLFK
Full Text Available othetical protein Fischerella sp. PCC 9605 MLTSYNIKDYEKAIFDFSKAIALEPNNPINHYERGNAYFLLKDYQRAIADYSKAIELKPNDSNAYELRGFAYFYLKGYQRAIA...DYTKVLELKTNDANAYELRGFAYFYLKGYQRAIADYSKAIELKPNNTNAYVLRGLAYYKLLDYQKAITDVQQASRLYYQQNNREGFQKAEDLLQELQSLINN
Full Text Available 05 696747:1505 ... hypothetical protein Arthrospira platensis NIES-39 MEVRNPTTPSLNDVGFHGVSPNLQRAIAYLSRLGGGNGTQHHHLSVMFGFTLVSPNLPRAIA...YLSRLGGGNGTQQHHLSVMLGFTLVSPKLPRAIAYLSRLGGGNGTQHHHLSVILGFTLVSPNLQMLGETQATKSDRLFK
Full Text Available thetical protein, partial Planktothrix agardhii MKFINPKTDYAFKKIFGSDQSQDILISFLNAIVYQGETFITYLEIIDPYAPGRISGLKTT...YFDVKAQLNNGENVLIEMQAFNVPAFGKRILYNTAKMYVNQLKLGEVYPELRAAIGVAVTDFIMFNEHNKVISQFTLKEDELQVNYQHSPLKLVFVELPKFNKTLEELTTITDKWLYFLRKAPDLEVVPESMLIVPEIEKAFTIADRVNLSLEEVDDLEKREQFERERIGAIELG
Full Text Available rug ABC transporter ATP-binding protein Planktothrix prolifica MNWAIEVKDSASMSSLNPVVATQNLGKFYRTGFWMNQKIESLKSC...QMRQYSKGMLQRVGMAQALINNPEVVFLDEPMSGLDPMGRYQIREIILSLKAQNKTVFFNSHVLSDVEKICDRIAILAEGE
Full Text Available alamin biosynthesis protein CobW Planktothrix prolifica MANLDVETPDFVLNIPKRGMPVTIITGFLGSGKTTLLNQILKNKQDLKIAVL...VNEFGDINIDSQLLISTDDDMVELSNGCICCTINDGLLDAVYRVLEREDRIDYMVIETTGVADPLPIILTFVGTELRELTNLDSVLTVIDAEAFTPEHFDSEAAFKQIVFGDIILLNKT
Full Text Available thetical protein, partial Planktothrix agardhii STPQGSTEDGPVAVPTQVQVQTEDGDVWQDVASPTSDNTDEKGRYYTTLSEYLERNKERH...ENRVFYCTDETTQATYIQLHTSQGLEVLFFDSFIDSHFISFLEREHTDVKFARVDAELDDNLIAKDNSPEIVDPKTNKTRSEIIKDLFTAALNKPKLTIRTESLKSEN
Full Text Available 3436 hypothetical protein CHLREDRAFT_180911 Chlamydomonas reinhardtii MTTEEPLSCSKIRSWNITVYSFTLKGLPGCLEPSHSFWVKEREGEWGLKCLSETFSHELVENVPGREEVSNLLKKGGSSNKSQKGGWICCERNCFLCQHKKCQVLI ...
Full Text Available 5297 3694:5297 predicted protein Populus trichocarpa MMINVVFAADSGLGSDAVFAADAEIGSDAVFAADSGLGSDAVFAADAEIGSDAVFAADSGLGSDAVFAADSGLVFAA...DSGLGNDAVFAADSGLGSDAVFAADAEIDSDAVFAADSGMGSDAAFAADSGLGSDAVFAADAEISSDAVFAADSGLGSDAVFAA...HFLIGSDAVFAADAEIGSDAVFAADAEIGSDAVFAADFSMNSDAELGGRGKTDFR ...
Full Text Available pothetical protein ANA_C11310 Anabaena sp. 90 MSVGAKHLEDELSVIAKNSSPNASPVQLLVGGKHLEDKLSVIAKNSSPNASPVQLSVGAKHLEDKLSVIAKNSSPNASPVQLSVVICPENNPKEQIICAVICGVEEKSNEITAIPELIKVLDMTGCLYSTHLNYLI
Full Text Available :1418 ... nitrogen fixation protein Microcoleus sp. PCC 7113 MTATNTTTETTTEEILAKPLMQELVKQIRGQDSYGTYRTWSDELLLKPFIVSKERKRKISVDGDVDLVTKARI...MAFYRAIAARIEQETGLLGQVIIDLSHEGFGWALVFCGRLLVVAKTLRDAQRFGFDSFEKLDAEGEKLLQKGIDLAKRYPEVGNI
Full Text Available tical protein Calothrix sp. PCC 7507 MYSQQLRTAIYGFFKRSHHLQLNCIDLQLNCIDLQLSCIDLQLSCIDLQLSCIDLQLNCIDLQLSCIDLQLSCIDLQLSCIDLQLSCIDLQLSCI...DLQLSCIDLQLSCIDLQLSCIDLQLSCIDLQSLNYPFLHFTDSFFVVMGTLKI
Full Text Available ... hypothetical protein Cal7507_4468 Calothrix sp. PCC 7507 MYSQQLRTAIYGFFKRSHHLQLNCIDLQLNCIDLQLSCIDLQLSCIDLQLSCIDLQLNCIDLQLSCIDLQLSCI...DLQLSCIDLQLSCIDLQLSCIDLQLSCIDLQLSCIDLQLSCIDLQLSCIDLQSLNYPFLHFTDSFFVVMGTLKI
Full Text Available 8:1201 ... hypothetical protein Oscillatoria sp. PCC 10802 MGHIPSSSFPDNRRAFSLWLWWGGLIEHRHVVAKIWALEIPGMNPTPQPPPRVRGGGERD...GFGGGGLIEHRHVVAKISALEIPGMNPAPQPPPRVRGGGERDGFGGGGFIDIRHVVAKIAGEPAPTNHRHPVSGEAADATHYIYADFEG
Full Text Available ... hypothetical protein PCC8801_3595 Cyanothece sp. PCC 8801 MSIQYLLDENLPHLYREQLLRLKSDLTVWIIGDPGVPPKSTLDPEILIWCEQNKFILVTNNRASMPVHLADHLSQNRHIPGIFVLRPKASIGEIIDDLILIDELGNPQDYQDCISHIPFI
Full Text Available 2419 hypothetical protein CHLREDRAFT_123820, partial Chlamydomonas reinhardtii RVQCRLVDMPAPCLPPFLPTCPHKPRRIPMPCTDAH...ELVDMPAPCLPPFLPDNLPARAPQAPHAVTDAHECMQCRLVDMPAPCLPPFLPKCPHKPRRLPMPCTDAHECNMPAPCLPPFLPKCPHKPRRLPMPCTDAHECMQCRLVDMPAPCLPAFLPNCPHKPRRLPMPCTDAHECSAGW ...
Full Text Available 18 3068:3318 hypothetical protein VOLCADRAFT_120454 Volvox carteri f. nagariensis MLVTTRSHRQVSLDGGVLPPEEIKQLASLRRQQQADLAKDSNIVQGALEEAQLITWPTREKALLDTVLVLFIVAGSGAMIFGMNVLLAELSEWWYHLA ...
Full Text Available DKRLFSQAVVQFQKALKAKDLEGDENTALIYNGLGYAHAAQEQYDIAIRQYKEALKLKPDYVVAFNNLGFAYEKKQLSAQALEAYESALALDPNNPTAKRRVEPLRKLYAPSAS ...pothetical protein Geitlerinema sp. PCC 7407 MDNSLTLSYLSLLLLLLAVASFFILRQVIRTRRTESTLSRLQNALKGGQGSAKDHYELGGIYL
Full Text Available LLIFVFVISMSLVLIITSGKPPSFPASIPDQILGLLGISSTSYVLGKALQGIKPSSAEEPSSVATAASPPPSPEPQSYDGQDLPPS ...etical protein Calothrix sp. PCC 7507 MPNINDLFTILALVIGCIITIFVAVLEIFILIFIWDGTWNVNTRKGKRRGINLEKLISESSGDASLARFQ
Full Text Available 1196 ... hypothetical protein PMT9312_1418 Prochlorococcus marinus str. MIT 9312 MKFLTTLFLKLLLLSNFVIAETIPTKSNILKQSSECIKDSQNQICRELVSKLEKLQYVVFDQNRFKCQSSLLGLQSELIEAYFLKSLSKKRISFMIPYVIKNC
Full Text Available 302 ... hypothetical protein Synechococcus sp. PCC 7336 MADRNESFTPQSCRHILSVEDCDGLRDHALGAPKYFIGRDIANDICLNSQFASRYHALLLRVPAEREGEYFYRLLDGDLEGKPSTNGLTVNGLKVSAHELHEGDEISFGPDAKATYRVECLSADAK
Full Text Available 2363:858 ... hypothetical protein Synechococcus sp. CB0205 MQPRLSQQEQRALIRAKRAVRCLPFRRRFYEELEREALSSTQLAARSDWTALSCRRLSANHCEYLLIWLIQLGVLRREVDGQGLTERVRLTPLGRVVLSDWPGEIPSASLPSRLRHWIKQHWPRL
Full Text Available e protein Chroococcidiopsis thermalis PCC 7203 MTDNLTPQDETCKPKDDEALAVCVQALGLPQIQRHVFICADQTLPQCCSKEASLESWDYLKKRLKQLKLDKPTSD...RPSCIFRTKANCLRVCTNGPILVVYPDGVWYRQATPEVIERIIQEHLIGNQIVREYAFLVHPLPEPTSDAIADDN ...
Full Text Available hypothetical protein Pro1185 Prochlorococcus marinus subsp. marinus str. CCMP1375 MLNGLNLCTLLEKYCYSKKISDKEVEFLERCWLDSLNSLHYNRYPGIAPAIVCDEAGVARGSYWISCNAAILDKLKPLGTTRSRSARIFDVLFQSGLIAA ...
Full Text Available ... 70447:4939 ... 70448:1546 ... unnamed protein product Ostreococcus tauri MKTRWSVECASPCSRRARIFDRSTRCSCRTARNVCSPWRCPRLVRRGVAGSTGRLRSRLSCAQRRRSGARPCPSPDRSGCQSSSTCTSSRRLGDTFCTPRTRPVSSRVP
Full Text Available mal protein L18 Prochlorococcus marinus str. MIT 9211 MATLSKKQQTQKRHKRLRRHLNGTNHRPRLAVFRSNNHIYAQVIDDEAQSTICSASTLDKDLREKLKASGGSCDASMAVGALLAQRALAKGIEQVVFDRGGNLYHGRVKALAKSAREAGLKF ...
Full Text Available r protein) Oscillatoria sp. PCC 6506 MNNELLHSQFNATAVEFDPFASGEILVTAPATEAQKEIWLSVQMGDEANCAFNESQSLRLRGPLNLEILRS...SFQAIVQRHEALRTTLSADGSTLCITESLNLEIPLIDLSALSEQERKIQLAQLRRQAVEQPFNLEHGPLLRVQIIKLQAEEHLAIITAHHIICDGWSWGVFIPDLGAI
Full Text Available 9 ... water-soluble carotenoid protein Stanieria cyanosphaera PCC 7437 MINSSLQIAGITNPTILDYFNTINQEEFIETANLFNENG...VLYAPFESPLEGKQAIASYLEKEAKDMKLEPKQGISENLADNLELIKVTGKVHTSLFSVNVRWEFTINSSQQLEAVKIKLLASPQELLKLNVKAN
Full Text Available carotenoid protein Stanieria cyanosphaera PCC 7437 MINSSLQIAGITNPTILDYFNTINQEEFIETANLFNENGVLYAPFESPLEGKQAIA...SYLEKEAKDMKLEPKQGISENLADNLELIKVTGKVHTSLFSVNVRWEFTINSSQQLEAVKIKLLASPQELLKLNVKAN ...
Full Text Available etical protein Npun_F5100 Nostoc punctiforme PCC 73102 MKTLVNLTQQSVVGEIESVLDTYPYHPYQKAFAIPDLRQELIVFVLTRLPSFDGAMSEGHISLAEAEQGSLAYYKLPRKPLEQQLHLQNLIHQGICLIVQEKSDWINDRVCEIVQPACEASHWFG ...
Full Text Available othetical protein Pleurocapsa sp. PCC 7319 MKFDLNLLKVLPLATLSIALLAPGKIAEAAGGYGAETGYGAETSYGADAGHGPATGGYGAETGYGADAGHGPATGGYGAETGHGADA...GHGPATGGYGAETGYGADAGHGPATGGYGTDAGHGADAGHGSATGGYGADAGHGADAGHGPATGGYGADAGHGADAGHGADAGHGADAGHGADAGHGADAGHGADA...GHGADAGHGADAGHGADAAGGPLGIPEFLVSKKAQRIEFFSFLTAIGLGIIIPEFVYKPRKNSQSQNSSANQQEQTAEIKQDTPSLKVVSENTKTLSVAADNTEKSNNQDTIEFKNQSEQDQEDQKAA
Full Text Available agen triple helix repeat-containing protein 'Nostoc azollae' 0708 MRLIEDGEDGEDGEDGEDGEDGEDGEDGEDGEDGEDGEDGEDGGEIFLMPYALCPMPYALCPMPYALCPMPYALCPMPYALCPMPYAQNQDFSHPNRESSVKLFSSVAPKP ...
Full Text Available photosystem II reaction center protein Psb28 Geitlerinema sp. PCC 7407 MAQIQFSPGVSEAVIPDVRLTRARDGSSGTATFYFERPQALVGESNAEITGMYLVDEEGQVMTREVKAKFINGQPEALEATYIMRSSEEWDRFMRFMERYAEEHGLGFSKS
Full Text Available 27:2058 ... hypothetical protein Geitlerinema sp. PCC 7105 MAVWKFALGSIAGVLVGTSPSLALPGQTVEEVQTWIQTNPLLPRGLEGSL...RVSRSDIPGQRFTFTALKTLPTDSNLVVGGRYIRSERLEVLDYENGVTRDRLVSTLRSIYDLDIYRDYRDAEVVYDYVSPAGREMQDVYRGVLLKGDRFGYWIEITEREGIEPVIGHLAIVLAEDVETLETQLRNR
Full Text Available 7:780 ... hypothetical protein Geitlerinema sp. PCC 7105 MSDKLEDFNGRNLDTKKSEGVERRKENRGKLLIDATVAPADIKYPTDVELLN...QARKTTELILDILYKSLKGQLYKKPRTHRKLARKEYLKFAKKRRPSRKERRNAVKKQLQYLQRNFRNIEKLIEKGASLECLSRRQYRNLLVSSEITRQQQWMWSNQ
Full Text Available ater-soluble carotenoid protein Geitlerinema sp. PCC 7407 MTQSLVPQQPVTQPLSAVASQFPVIQQYFDALNGETYPEAASLFEEAGVL...RAPFEQPVQGREAIAAYLSQEARGMVLQPDQGTVASTAYGAEITIQGKVQTRLFKVNVAWIFEVVSARSALASVQVKLLASPQELLKLRR
Full Text Available 7:918 ... hypothetical protein Geitlerinema sp. PCC 7105 MLYIYNHKLRRPSPLYIFKNFFESKAVENLLGEGIKAEYLNDDRLGRVLDKI...YRFGLNRIFVAIALATVKKYELTVNSNHLDSSSFHVHGDYPNSGESGTIEITYGYSRDHRPDLKQFLMNLICTGDGDVPLWMKMGSGNDSDSKQFGRSMVEFKKHFSLKV
Full Text Available 27:2531 ... hypothetical protein Geitlerinema sp. PCC 7105 MLMNIIFLTSAITTAAIATYFITHRLGGFRTVATLLCLGLLAGTGSLPAL...ASTDIAALGAKSNTLEQGQQQLEEDLKLTPTGGQYSGIEYAKGAERGEPMTDRKIRETILSDTSEDLTVNVASGSVILSGTVRDKDRAREIVDEIKGISGVHEITFELGLEEGQSS
Full Text Available 7:1853 ... hypothetical protein Geitlerinema sp. PCC 7105 MQKKYIVRLTAEERHQLQAVIKKLNGSSQKVRRAQILLKADADGPNWTDQQ...IAEAFDCRTKTVENIRRRLVEQGFEITLNGVKRTRPPTDKRLDGEQEAQVIAMRLGPPPAGYANWSLRLLARKVVELGLVEAVSHETVRQTLKKTG
Full Text Available 7:613 ... hypothetical protein Geitlerinema sp. PCC 7105 MKNKATAALFAFFLGFLGIHKFYLGQNFSGVLYLILSWTGIPAILAIFDFLGLLLMSDATFNVRYNNMVTVVRDNLVVTPSPDRSRSATEITRALADLKKLYEVGAVTAEEYEEKRQKLLSEL
Full Text Available 017 ... 38832:3226 ... 38833:3539 ... 564608:3539 predicted protein Micromonas pusilla CCMP1545 MHPPLTLENHPLCKDVVIALKRCHRDNPWARAWGACNEQKWALDDCLKKQKLFKFRANHAKAKAQQDRLRRRVEKYGHATTNGQFKG
Full Text Available 30 696747:230 ... WD-40 repeat protein Arthrospira platensis NIES-39 MVIASGGASLFNLATGEAVWEIDCPALGGAVSADGRLLALRSNKDIYLWDLSTGQLLRQLTGHTST...VNSVRFSRRGQTLASGSGDNTVRLWDVATGRELRQLTGHTSTVNSVRFSRRGQTLASGSGDNTVRLWDVATGRELRQLTGHTSTVYSVSFSRRGQTLASGSDDGVVRLWRVGF
Full Text Available 09 ... 214909:3609 ... 3640:3609 ... 3641:3609 ... Uncharacterized protein TCM_008191 Theobroma cacao MEETSLGLSFTKDENFREWYSEIYFVAVNSEMIECNDISSYYILRSRAISIGRLTCTFPAAVCTPEIKASFNILLRVNDVQ
Full Text Available conserved hypothetical protein, ribA/ribD-fused Moorea producens MTIYFYDISEKPYGCFSNFSPHGFELDGLWWPTSEHYFQAQK...FAGTSHVEEIRSCKTPAEAASMGRERTRPLRRDWEEIKEDVMGRGLLCKFQTHADIREILLGTGDELIVEDAPQDYYWGCGKDRSGKNRLGEILMEIRAILRES
Full Text Available 5297 3694:5297 predicted protein Populus trichocarpa MMMNAVFAADAEIGSDAVFAADSGLGSDAVFAADFGLGSDAVFAADAEIGSDAVFAA...DSGLGSDTAFTTDTEIGSNTVFAAHFLIGSDAVFAADAEIGSDAVFAADFSMNSDAELGGRVFAADAEIGSDAVFAADSGLGSEAVFAADSGLGSDAVFAA...DAEIGSDAVFAADSGLGSDAVFAADSGLGSDAVFAADAEIGSDKVFAADSGLGSDAVFAADSGLGSHAVFAADAEIGSDAVFAADSGLGSDAVFAA...DSGLGSDAAFTTDTEIGSDTVFAAHFLIGSDAVFAADAEIGSDAVFAADFSMNSDAELGGRGKNWF ...
Full Text Available :36 ... hypothetical protein, partial Prochlorococcus sp. scB241_526B22 VYSSHLNNQRELIVTSESTRESINLAKYLTDNGVVKYSAYWCPNCLNQSELFGKQAYRELNVVECARDGINSQTQLCIDKKIKGFPTGEINGALILGVLSLKELSKLTGFKN
Full Text Available 3051:329 ... 3052:329 ... 3055:329 ... predicted protein Chlamydomonas reinhardtii MAPAALPGRSVKSKQAHLLRTDAHRVKSKQAHLLRTDAHRVKSKQAHLLRTDA...HRVKSKQAHLLRTDAHRVKSKQAHLLRTDAHRVALTTLTGALSLFGGACTATSFVLQVSASAASYAASLRLSCPAVPSLTDVA
Full Text Available 058 3068:3058 hypothetical protein VOLCADRAFT_87241 Volvox carteri f. nagariensis MPNPLAEMELLGFWGLKLVATVTDCHMSDSGRVMTAFVFKVVSYRNEAAST...LAEMELLGFWGLKLVATVTDCHMSDSGRVMTAFVFKVVSYRNEAASTMLTPEPLPESLEYLQAQVERALDERRELERVMWA...AREGRGGPSMLSCKQLETIELSTMGEAAELEVKRALEAITVVQYSMPNPLAEMELLGFWGLKLVATVTDCHMSDSGRVMTAFVFKVVSYRNEAASTMLTPEPLPESLEYLQAQVERALDERRELERVMWAAREGRGGPSMLSCKQLETIELSTMGEAAELEVKRALEEMFH ...
Full Text Available 63737:1993 ... hypothetical protein Npun_R2630 Nostoc punctiforme PCC 73102 MDTLDLQSVSTEDVMLRYGIKSRTTLNKFLENAGVNSFKEGRKTFIRMYQLGVLDRSAH...ELNYPINQSSNQSIQSIHPTDSIKSEQMELAESTGLFPLTTVDLLYITCEYENLPRLAKWLAGYAFLEKMSSGRVILPRDVVLKILDYKRLPTCKDGYFRYGNFVFLMIGDHKKEWLVSKK
Full Text Available 59:965 ... RecR protein Prochlorococcus marinus str. MIT 9211 MPGIGPRTAQRLALHLLRQPEERIKAFANALLNARNQVGQCQQCFHLT...EGNECEICLNQNRQRNLICVVADSRDLLALERTREYKGLYHVLGGLISPMDGIGPELLNISPLVKRITSEETTEVILALTPSVEGDTTSLYLAKLLNAFVKVTRIAYGLPVGSELEYADEVTLARALEGRRTVE
Full Text Available 1:947 ... 50S ribosomal protein L18 Prochlorococcus sp. W2 MNTISRKQQTQKRHRRLRRFLVGTKAKPRLSVFRSNNHIYAQVIDDQAQSTICSASTIDKEFKIKDNESTSNCNSSSEVGLLLAKRAIKKGVKEVVFDRGGKIYHGRVKALADAARKAGLKF
Full Text Available ... 70447:186 ... 70448:4205 ... Conserved Zn-finger protein (ISS) Ostreococcus tauri MSAHPRYRDDDRDRARARGGDDDRDRARDDRRPRFGATDDGDRG...QWPYAVKIYRDERGEKKDEAVITYDDPHAAQSAPEFYNGYEHNGKKLHVSMAQSKPKAPPPSQGDNDRGHGGRGGDDRDRGGYGDRGGRGGGYGDRGGYGDRDRGGYGDRDRGGYGDRDRGGYGDRGRERGRFGDRDRGGHRPY
Full Text Available 3051:1120 ... 3052:1120 ... 3055:1120 ... SR protein factor Chlamydomonas reinhardtii MSYRDRDRDRGDRGYSDRDRDRGRDDRRGGDRGGDRGGGGGGDRG...PRDMMRIESKTKGDERRDDRRRSRSRSPRRSSRRSSRSPRRSRSRSPRRSRSPRADRGRDRSPRDRSPRDRSPRDRSPRDRSPRERSPVRVERERSPERERSPERERVREDSRSPPPRERSPPPRDRSPPPRERSPSPRRDSPPRDDYAGDDF
Full Text Available :162 ... hypothetical protein, partial Prochlorococcus sp. scB243_496A2 MRILLAAAECAPMIKVGGMGDVVGSLPPSLIKLGHDVRVIIPGYGKLWSLLEVSNEPVFRTNTMGTDFAVYEAKHPIHNYVIYLVGHPTFDSDQIYGGENEDWRFTFFASAT
Full Text Available SPYFYASWMPEKEDDYRFTNKKRTPLECSTGTKDARSAALKAISWVKEKQKDCLRKITEYQEVKTKCLEHYWEEHFIDFSSTRASRKSVTKLINDEKLKWCSPTYGIG...6:352 ... hypothetical protein Prochlorococcus sp. scB243_495N4 MTSLSAMDGKLNDRTWINISESRYELELGNRVSFPINLYLKKRVN
Full Text Available 0562:1027 ... hypothetical protein Cal6303_0799 Calothrix sp. PCC 6303 MSNLSHGGFDDSKPTIDENSFWQYIRSLHPQTVKQLNKPSSTDVVETINLTVATILDHISDDSSDSQIVTSHDELGMLLGSVMIDGYFLRNAEQRMELDHIFQELGTGGEQE
Full Text Available photosystem I reaction center protein PsaF subunit III Geitlerinema sp. PCC 7407 MRRLFALVLVVFLWIGFAPPASADVA...GLTPCGDSPAFLQRAKNATTPGAKARFERYAESQVLCGPEGLPHLIVDGRLDHAGEFLIPGLLFLYIAGWIGWAGRSYIIAIRKEGNPEEKEIIIDIPLALKLSLAALAWPATALKEILSGEIAAKNEEITVSPR
Full Text Available :617 ... hypothetical protein GEI7407_0462 Geitlerinema sp. PCC 7407 MEKIESLPIIGWREWLALPELGISAIKTKVDTGARSSALH...AFDLRRFCQAGQEWVRFTIHPYQHDLQQTVVATARVIDERQVRTSSGHTELRPVIHTPILLGGCQWPIEITLTNRDVMGFRMLLGRQAIRQRFLVDPGHSFLLSSLRLPLRSPTSRSQPL
Full Text Available 5:1771 ... hypothetical protein GEI7407_2207 Geitlerinema sp. PCC 7407 MKSLSLCVLLSAGVLSLGALPSQAMAPAELVETPPNHS...VAIHRSEGNCPAQVDLWVQSRYYEGGGEFSALVDTAAIAGRAVFLEAKQDFVEFAAPLKPQYASCYGYVVSSDEPQYNLWFYKGYVYFRFDLQSLPGRPLSEITSQAIIEDRPFMRWAIAD
Full Text Available center protein PsaF subunit III Geitlerinema sp. PCC 7407 MRRLFALVLVVFLWIGFAPPASADVAGLTPCGDSPAFLQRAKNATTPGAK...ARFERYAESQVLCGPEGLPHLIVDGRLDHAGEFLIPGLLFLYIAGWIGWAGRSYIIAIRKEGNPEEKEIIIDIPLALKLSLAALAWPATALKEILSGEIAAKNEEITVSPR ...
Full Text Available :1268 ... hypothetical protein GEI7407_1530 Geitlerinema sp. PCC 7407 MAQLQRLDIVGDDGQTIEIFVEEKDAPVLATSPNRDGRP...SMGAGSPSVKMQQMQQVIRGYATYALNAFKDFSAAEVEEITLMFGVKLSASAGIPYIANGTTDSNLEVQVKCRFPAKDG
Full Text Available 7:572 ... 30S ribosomal protein S11 Geitlerinema sp. PCC 7105 MARQSGKKSGARKQKRHVPNGVAYIQSTFNNTIVTITDPRGETISWA...SAGSSGFKGAKKGTPYAAQTAAESAARRASDQGMRQIEVMVSGPGSGRETAIRAIQGAGLEITLIRDITPIPHNGCRPPKRRRV
Full Text Available 848 ... peptidoglycan-binding domain 1 protein Geitlerinema sp. PCC 7407 MKLQTIFRLLTIAVGLLGGSFTQPAIAHSLASQQTS...PVLIATAYTDLTLPTLRQGDRGRSVELLQNILLDNGFLGAAGVRLGNPQGAIVDGIFGEITAAAIRDLQRRYEIPVTGQVNPTTWEVLDMYENPYRSPLPWKQ
Full Text Available 79506 Solanum tuberosum MEETSTSSNNAKAKARVRVCITRKKTLKDKRAKLYIIRRCLYMLLCWKERAEFCNVGNRESTA ...3993 4070:3993 ... 424551:3993 ... 424574:3993 ... 4107:3993 ... 4113:2476 ... PREDICTED: uncharacterized protein LOC1025
Full Text Available WP_026796581.1 ... 1117:7580 ... 1150:51181 1301283:72257 ... 54304:1131 54307:211 ... hypothetical protein Plankt...NDVINTIEHLLETEFQQSCIHKRLKLPGLASEIALVVDGTLQTIGFYHQKIHVLSEMNKTIACSIAKAQRELGYNPTIALEEGMRRSLKWIFENYGGLD
Full Text Available WP_027254540.1 ... 1117:3646 ... 1150:52865 1301283:74127 ... 54304:528 1160:1354 ... hypothetical protein Plankt...FCQRYKPKEKKTPTRCHWGSKLLAGVHLSNKTLTTNPKKSKSRLVQTPCQVSKSPELTRVVSQFIEANRAPWQAEKDF
Full Text Available WP_026796754.1 ... 1117:7970 ... 1150:52478 1301283:73697 ... 54304:23 54307:536 ... hypothetical protein Plankt...LAKLKQDIAQTEALNPMEKAMVEVPIKMIESELQKPEANKTLINQAVVALKKGLEGVETLAEPVIKVAAILAKVWI
Full Text Available WP_027249473.1 ... 1117:5662 ... 1150:51230 1301283:72312 ... 54304:1176 1160:459 ... hypothetical protein Plankt...RQISFRDQNNTVQWVIHRPDETPTESQWTILDQGVQIDTEETTLYQNKTTKIWRMQFDHKGRANGQLGRMTVSLRNGSPAKRCTFVSTLLGTLRTSQNNPKPKDGKYCY
Full Text Available WP_026796485.1 ... 1117:7766 ... 1150:53377 1301283:74696 ... 54304:99 54307:105 ... hypothetical protein Plankt...GNYADSEAIFRQLVENQPKEAKYHFYLGNSLFYQRKIEEATQVYQEAISLNPQYGLAYNALGFLHASQGQWDEAIAQYQKALEINPDYAEALKNLGESLWKKGNTAEANNAWKKALELYTQQGNNKAVLQLQEMLNKTSQ
Full Text Available WP_026788149.1 ... 1117:1255 ... 1150:52201 1301283:73391 ... 54304:205 59512:888 ... hypothetical protein Plankt...ARTEQLPEPVYTQGLIRTYADALGLNGVELANFFLPEPQKVGMKSKLNFLTLPQLRPTHLYLTYILLIICAINGVSYLNKTANFASVSGEPVATTNPPEVNPQLRQAV
Full Text Available WP_026786633.1 ... 1117:7276 ... 1150:53339 1301283:74654 ... 54304:955 59512:541 ... hypothetical protein Plankt...HEAQPDKFPHIPASMWWAVITLTTVGYGDVYPITPLGRLLGGILALLGIGLIALPAGIIASGFTEVIALNQRKNKTIYPKICPHCGKNIDQPLEDSTDLDH
Full Text Available WP_026785525.1 ... 1117:5888 ... 1150:52976 1301283:74250 ... 54304:628 59512:189 ... hypothetical protein Plankt...GVALLGMAYPIFSKMLSNDTLTKEPFRVFFALAIFLLSIASFTLLFKARVKLWKGIFATFTGMGLIILGSQPEIYRRDNEWFVSHYYYGITAALLMIFSVAIVQDIYQDKQNRWRTAHIILNCFALLLFIGQGMTGARDLLEIPLHWQEHYIYQCDFTNKTCSQPK
Full Text Available WP_027255564.1 ... 1117:4943 ... 1150:53097 1301283:74385 ... 54304:737 1160:1650 ... hypothetical protein Plankt...TSGRKQAKSGKGFSPVMVGQKWMLSQLEKLVPVVKIEGYRTASTRKYLGLKKNKTDKSKPEFNTHAVDGVAIAATAFVEYR
Full Text Available WP_027248974.1 ... 1117:6200 ... 1150:51081 1301283:72146 ... 54304:1041 1160:304 ... hypothetical protein Plankt...NGENVLIEMQAFNVPAFGKRILYNTAKMYVNQLKLGEVYPELRAAIGVAVTDFIMFNEHNKVISQFTLKEDELQVNYQHSPLKLVFVELPKFNKTLEELTTITDKWLY
Full Text Available WP_027255213.1 ... 1117:7881 ... 1150:51355 1301283:72450 ... 54304:1289 1160:584 ... hypothetical protein Plankt...SPPDGVSPLSETPTPAITTPISPTPQVKQPESAILGLVFVTPAQKPIQPALKPQIIPGTQSQNKTSTKTACSVQPTTGNICTTPLPSAVVPSSTTTESYWATPFILYF
Full Text Available WP_026798031.1 ... 1117:6249 ... 1150:53365 1301283:74683 ... 54304:979 54307:314 ... hypothetical protein Plankt...LCEEISSQLLLPVETDYVDSDFNYSSLWQNKTVETSWFSKILYTAQKPNSQPIFSPSLVSFLVGCTDSEATAKKSKKIRIYLNPEQKKLLKQWFGVSRFVYNETIKYLQQPDTKANWMAIKTGILNGLPEWAKPWVD
Full Text Available WP_027248948.1 ... 1117:44906 ... 1150:51132 1301283:72203 ... 54304:1088 1160:350 ... hypothetical protein Plankt...IFVEEGSVLNEKIEKAYSELKIEVKKKESTSDQQEKARNWMIENFYDIRMFGAVLSTGLNAGQVWGPLQISWGRSYDPVLPISATITRCAATEAKEKKDNKTMGRKEL
Full Text Available WP_026787645.1 ... 1117:7766 ... 1150:53377 1301283:74696 ... 54304:99 59512:39 ... hypothetical protein Plankt...NYAASEAIFRQLVENQPKEAKYHFYLGNSLFYQRKIEEATQVYQEAISLNPQYGLAYNALGFLHASQGQWDEAIAQYQKALEINPDYAEALKNLGESLWKKGNTAEANNAWKKALELYTQQGNNKAVLQLQEMLNKTSQ
Full Text Available WP_026798653.1 ... 1117:6200 ... 1150:51081 1301283:72146 ... 54304:1041 54307:1121 ... hypothetical protein Plankt...QLNNGENVLIEMQAFNVPAFGKKILYNTAKMYVNQLKLGEVYPELRAAIGVAVTDFIMFNEHNKVISQFTLKEDELQVNYQHSPLKLVFVELPKFNKTLEELTTITDK
Full Text Available WP_027249886.1 ... 1117:7580 ... 1150:51181 1301283:72257 ... 54304:1131 1160:409 ... hypothetical protein Plankt...VINTIEHLLETEFQQSCIHKRLKLPGLASEIALVVDGTLQTLGFYHQKIHVLSEMNKTIACSIAKAQRELGYNPTIALEEGMRRSLKWIFENYGGLD
Full Text Available WP_026786359.1 ... 1117:2598 ... 1150:51863 1301283:73014 ... 54304:1746 59512:466 ... hypothetical protein Plankt...INCYRVIKDNVEELIEVLKVHKAKNSKEYFDYLRERDRLKQYNKFSDIQKAARIIYLNKTCYNGLFRVNSKGQFNVPFGSYKNPNILDEAVLRGVNDYLNQKSVTFLN
Full Text Available WP_027250244.1 ... 1117:4728 ... 1150:51209 1301283:72288 ... 54304:1157 1160:435 ... hypothetical protein Plankt...DKVMTIVESLSGYKLYKTASENFGLIFETAQKIINLPEPARKDIAKWLKLSNPCSVNKIGDIQENLYFLGDFSEAIIQAGLSQNKTFFSRN
Full Text Available WP_027249441.1 ... 1117:5593 ... 1150:51293 1301283:72381 ... 54304:1232 1160:703 ... hypothetical protein Plankt...KWIREDRMSSGMWRTIIHIGEIFLSSEGSVILIDEFENSLGINCIDILTEDLIHENKTLQFIATSHHPYIINNIPYEYWKIVTRQGGHISIGNASDYHLGKSKQDAFIQLTKILEKQS
Full Text Available WP_026787834.1 ... 1117:7580 ... 1150:51181 1301283:72257 ... 54304:1131 59512:170 ... hypothetical protein Plankt...NDIINTIEHLLETEFQQSCTHKRLKLPGLASEIALVVDGTLQTLGFYHQKIHVLSEMNKTIACSIAKSQRELGYNPTITLEEGMRRSLKWIFENYGGLD
Full Text Available WP_026796432.1 ... 1117:6249 ... 1150:53365 1301283:74683 ... 54304:979 54307:314 ... hypothetical protein Plankt...LCEEISSQLLLPVETDYVDSDFNYSSLWQNKTVETSWFSKILYTAQKPNSQPIFSPSLVSFLVGCTDSEATAKKSKKIRIYLNPEQKKLLKQWFGVSRFVYNETIKYLQQPDTKANWMAIKTGILNGLPEWAKPGID
Full Text Available WP_026786027.1 ... 1117:3646 ... 1150:52865 1301283:74127 ... 54304:528 59512:364 ... hypothetical protein Plankt...GFCQRYKPKEKKTPTRCHWGSKLLAGVHLSNKTLTTNPKKSKSRLVQTPCQVSKRPELTRIVSQFIEANRAPWQAEKDF
Full Text Available WP_027249996.1 ... 1117:1255 ... 1150:52201 1301283:73391 ... 54304:205 1160:1045 ... hypothetical protein Plankt...RTEQLPEPVYTQGLIRTYADALGLNGVELANFFLPEPQKVGMKSKLNFLTLPQLRPTHLYLTYILLIICAINGVSYLNKTANFASVSGEPVATTNPPEVNAQLRQAVV
Full Text Available WP_026797935.1 ... 1117:2598 ... 1150:51863 1301283:73014 ... 54304:1746 54307:1182 ... hypothetical protein Plankt...LINCYRVIKDNVEELIEVLKVHKAKNSKEYFDYLRERDRLKQYNKFSDIQKAARIIYLNKTCYNGLFRVNSKGQFNVPFGSYKNPNILDEAVLRGVNDYLNQKSVTFL
Full Text Available WP_027254413.1 ... 1117:3316 ... 1150:51705 1301283:72839 ... 54304:1603 1160:934 ... hypothetical protein Plankt...NLNSDWFCYHDRNFGRFRWGEDIGWEWFVIFAQTETKIPLTLILDWRTNKTHSQGGLPYIFIYQNHQLRKIFLGETLRLNW
Full Text Available WP_026797408.1 ... 1117:6743 ... 1150:53326 1301283:74640 ... 54304:943 54307:169 ... hypothetical protein Plankt...WLWNRQNQLGIAFDSSTGFHLPNGADRSPDASWIRQERWDLLTQEEREIFAPICPDFVLELRSKNDAIEKLQAKMIEYIENGASLGWLIDRKNKTVEIYRQNQDIELLNHPLILSGEDILPGFMLNLTEVWN
Full Text Available WP_027249824.1 ... 1117:3316 ... 1150:51705 1301283:72839 ... 54304:1603 1160:934 ... hypothetical protein Plankt...NLNSDWFCYHDRNFGRFRWGEDIGWEWFVIFAQTETKIPLTLILDWRTNKTHSQGGLPYIFIYQNHQLRKIFLGETLRLNW
Full Text Available WP_027254369.1 ... 1117:7580 ... 1150:51181 1301283:72257 ... 54304:1131 1160:409 ... hypothetical protein Plankt...VINTIEHLLETEFQQSCIHKRLKLPGLASEIALVVDGTLQTIGFYHQKIHVLSEMNKTIACSIAKAQRELGYNPTIALEAGMRKSLKWIFENYGGLD
Full Text Available WP_027254586.1 ... 1117:6743 ... 1150:53326 1301283:74640 ... 54304:943 1160:197 ... hypothetical protein Plankt...WNRQNQLGIAFDSSTGFHLPNGADRSPDASWIRQERWDLLTQEEREIFAPICPDFVLELRSKNDALEKLQAKMIEYIENGASLGWLIDRKNKTVEIYRQNQDIELLNHPLILSGEDILPGFMLDLTEVWN
Full Text Available WP_027250041.1 ... 1117:7881 ... 1150:51355 1301283:72450 ... 54304:1289 1160:584 ... hypothetical protein Plankt...SPPDGVSPLWETPTPAITTPISPTPQVKQPQSAILGLVFVTPAQKPIQPALKPQIIPGTQSQNKTSTKTACSVQPTTGNICTTPLPSAVVPSSTTTESYWATPFILYF
Full Text Available WP_027254850.1 ... 1117:5593 ... 1150:51293 1301283:72381 ... 54304:1232 1160:703 ... hypothetical protein Plankt...WIREDRMSSGMWRTIIHIGEIFLSSEGSVILIDEFENSLGINCIDILTEDLIHENKTLQFIATSHHPYIINNIPYEYWKIVTRQGGHISIGNASDYHLGKSKQDAFIQLTKILEKQS
Full Text Available WP_027254871.1 ... 1117:17730 ... 1150:52975 1301283:74249 ... 54304:627 1160:1450 ... hypothetical protein Plankt...DCCAWSMQTVYSELQKHGAEFRFVPWDTFRDGARERNKTVPSELGGFSRSNDAAFLQEAADFINNQLDPNRPLVLIGHSFGGDSLLSLVPRINRRIQFLGVIDPTAAG
Full Text Available 0:477 ... hypothetical protein, partial Prochlorococcus sp. scB243_498P3 MSTKSDSLKEKLIENFSDFSKLSDYSFMNYLRADPQ...STKDGNDHKPRSVYSGHYVPVLPTAIPEPEYISHSKKLFKELRLSSDLTKDKNFCLFFSGDISVANYPMSPVGWATGYALSIYGTEYTQQCPFGTGNGYGDGIAISVFEGLFNGKRMEMQLKGGGPTPYCRGA
Full Text Available 3065:1239 ... 3066:1239 ... 3067:1239 ... 3068:1239 ... hypothetical protein VOLCADRAFT_66785 Volvox carteri f. nagariensis MRVGERDCPRVGERD...CPGVGERDCPGVGERDCPRVGERDCPGVGERDCPGVGERDCPRVGERDCPGVGERDCPRVGERDCPGVGERDCPGVGERDCPGVGERDCWPNVDSWTNLSNGRLMRVGER...DCPRVGERDCPGVGERDCPGVGERDCPRVGERDCPGVGERDCPGVGERDCPRVGERDCPGVGERDCPRVGERDCPGVGERDCPGVGERDCPGVGERDCWPNVDSWTNVCVVFRFLNLGPN
Full Text Available 065:363 ... 3066:363 ... 3067:363 ... 3068:363 ... hypothetical protein VOLCADRAFT_35996, partial Volvox carteri f. nagariensis HIAYCISH...IAYCISHIAYCISHIAYCILHIAYCISHIAYCVSHIAYRILHIAYRILHIAYRILHIAYCILHIAYCILHIAYRISHIAYCISHPYRCIWHIAY
Full Text Available 065:363 ... 3066:363 ... 3067:363 ... 3068:363 ... hypothetical protein VOLCADRAFT_99209 Volvox carteri f. nagariensis MQMHAHTYNISHIVYCISH...IAYCISHIAYRISHIAYRISHIVYRVSHIAYRILHIAYCILHIAYCILHIAYCILHIAYCILHIAYCILHIAYRISHIAAYMAYRISHTAYRISQIAYRCISHIAAYRCILHITYMHIIYAHI
Full Text Available 065:363 ... 3066:363 ... 3067:363 ... 3068:363 ... hypothetical protein VOLCADRAFT_100737 Volvox carteri f. nagariensis MYNISHIVYCISHIAYCISH...IAYRISHIAYRILHIAYCISHIAYCISHIAYCISHIAYRISHIPYRCTISLHMAYRISHTARISHIANCISLHIAYCILHIAYCISHIAYPISLHHIAAYGISHITYRTHIAYRKLHIAAYRISLHIAAYCISHIHICIYAHI
Full Text Available 065:363 ... 3066:363 ... 3067:363 ... 3068:363 ... hypothetical protein VOLCADRAFT_90903 Volvox carteri f. nagariensis MQMHIVYCISH...IAYCILHIAYRILHIAYCISHIAYRILHIAYCILHIAYCISHVAYCISHIPYRCIWHIARISHTAYRIPQITYRCISHIAAYRCILHITYTYMYIYAHI
Full Text Available 065:363 ... 3066:363 ... 3067:363 ... 3068:363 ... hypothetical protein VOLCADRAFT_71945 Volvox carteri f. nagariensis MRICLHIAYVCISH...IAYRICACLHIAISHIIHIAYRILPIAYCISHIAYCISHIAYCILHIAYCISHIAYRISHIAYCISHIAYCISHIAYCILHIAYCILHIAYCILHIAYCILHIAYCILHIAYCILHIAYCILHIAAYGILHIAYAYRSQHSIA
Full Text Available 832:429 ... 296587:152 ... predicted protein Micromonas sp. RCC299 MRARGKVEVELQGVQRLSARCKECGGSQICEHGRQRFHCRECGGSGICEHGRGRHRCKECG...GSQICEHGRVRSQCKECGGSGICEHGRRRSLCKECGGSGICEHGRQRYSCKECGGAGICEHGRERYSCKECRAAKAGTFPDVDVEVGVTEDA...SSKGAKRKRAPYTKGPCEHGVKYRSQCKVCSACPHGRQRNKCKECGGASICVHGRERNKCKECGGASICEHGRQRSHCKECGGASICVHARERNKCKECG...GASFCEHGRQRRYCKECGGSQICEHGRVRRLCKECGGSGICEHGRQRPQCKECGGSQICEHGRQRYSCKECRAAKAKQR
Full Text Available 8832:429 ... 296587:152 ... predicted protein Micromonas sp. RCC299 MLPDVDVEVGVTEDASSKGTKRKRAPKTKGPCEHGVKRRSNCKVCSACPHGKWRYWCKECG...GAGICEHGRERRRCKECGGASICEHGRQRRYCKECGGGSICEHGRVRYYCKECGGSGICEHGRDRSRCKECGGGSICEHGRERYYCKECGGSQICEHGRRRSECKECG...GSQICEHGRRRSECKECGGSAICEHGRQRYYCKECGGSGICEHGRDRSRCKECGGGSICEHGRERYYCKECG...GAGICEHGRIRSTCKECGGSRICEHDRQRHTCKDCGGSQICEHGRVRSKCKECGGSGICEHGRHRQYCKECGGGSFCEHGRQRRKCKECGGSQI
Full Text Available 8832:429 ... 296587:152 ... predicted protein Micromonas sp. RCC299 MPAIWNVSGPLPDVDVEVGVTEDASSKGTKRKRAPPTKGPCEHG...VKPRSKCKVCSACPHGKRRSECKECGGSQICEHGRRRTQCKECGGSQICEHGRVRSTCKECGGSGLCEHGRERSRCKECGGPGICEHGRVRSRCKECGGSQICEHGRQRSKCKECG...GGSICEHGRIRSTCKECGGSQICEHGRERSKCKECGGGAICEHGRIRSTCKECGGGAICEHGRERHRCKECGGSGICEHGRRRSQCKECG...GSAICEHGRHRQYCKECGGGSICEHGRIRSTCKECGGGAICEHGRQRHRCKECGGASFCEHGRQRSRCKECGGSGICEHGRRRSTCKECRAAN
Full Text Available 218 ... 38832:274 ... 38833:414 ... 564608:414 predicted protein Micromonas pusilla CCMP1545 MAQLPSYEVDDGEDSMPGAPGEGPMTDSMQGPPVEVPTSDSM...PGAPGEVPTMDSMHGPPVEVPTMDSMHGAPVEVPMMDSMPGAPVEVPTMDSMQGPPVEVPTMDSMQGAPGEGPMTDSMHGAPGEGPTMDSMNSGNPTKCVVPDWCSTYPPEMQKSKPECQCPDDSHP
Full Text Available 401 ... 38832:1362 ... 38833:877 ... 564608:877 predicted protein Micromonas pusilla CCMP1545 MQDSMHDTMHDSIQDSMHDSIQDSMQDSM...AKEEEEPAEPPAKEEEAHAEPPAKEEEAHAEPPAKEEDYAEPPAKEEDYAEPPAKEEAHSMDSMDSMDSMDSMDSMDSMHSMDSMDSMDSMDSMHSMDSMDSMDSM...DSMHSMDSMDSMDSMDSMDSMDSMHSMDTSIDAVDAAANVTDAADTAGAAANVTDAADTAGAAAEEKPPENASVDSLDSLLDG
Full Text Available 3051:4703 ... 3052:4703 ... 3055:4703 ... hypothetical protein CHLREDRAFT_120274, partial Chlamydomonas reinhardtii PPGCRCSSAPPGCRC...SSAPPGCRCSSAPPGCRCSSAPPGCRCSSAPPGCRCSSAPPGCRCSSAPPGCRCSSAPPGCRCSSAPPGCRCSSAPPGCRCSSAPPGCRCSSAPPGCRCS
Full Text Available 3688:2831 ... 238069:2831 ... 3689:2831 ... 3694:2831 ... caprice family protein Populus trichocarpa MDRRRKKQAKTTSCCSEQEVSSIEWEFINMSEQEEDLIYRMHNLVGDRWALIAGRIPGRKAEEIERFWLMRHGEGFASRRREQKRCHS
Full Text Available 35472:181 ... 41891:181 ... 248742:181 ... 574566:181 hypothetical protein COCSUDRAFT_37270 Coccomyxa subellipso...idea C-169 MAAAVLSLLATSCTPATGAMPAFARMSIDIAEASEVESSAEASTSKAPMPVYFGNGCFWGRQKDFVDAEKALGRSPEQISSVVGYAGGREQGPKGRV
Full Text Available 65:2039 ... 3066:2039 ... 3067:2039 ... 3068:2039 ... hypothetical protein VOLCADRAFT_35179, partial Volvox carteri f. nagariensis EDRG...PRTEDRGPRTEDRGPRTEDRGPRTEDRGPRTEDRGPRTEDRGPRTEDRGPRTEDRGPRTEDRG
Full Text Available WP_015170531.1 ... 1117:3357 ... 1150:56771 1301283:78467 ... 63132:1699 1173025:617 ... hypothetical protein Geit...QHDLQQTVVATARVIDERQVRTSSGHTELRPVIHTPILLGGCQWPIEITLTNRDVMGFRMLLGRQAIRQRFLVDPGHSFLLSSLRLPLRSPTSRSQPL
Full Text Available WP_015171207.1 ... 1117:173 ... 1150:57133 1301283:78870 ... 63132:2023 1173025:1133 ... hypothetical protein Geit...RTSIPGAVRYTVVYDNGANQAVVAVAPEITETELEATLRQAAGDLFSLGRYGGQDNQFMIRARTIIHPSEGLSKPLFLGQVKRSLAVREDENMQVELFRQNFAELPSDRA
Full Text Available YP_007110277.1 1117:15352 1150:9068 63132:1761 1173025:1761 hypothetical protein GEI7407_2752 Geit...GGTEYTVIPDTLTIGGPATLVGASDRGIEYTAPLQSRYASCVGETLEQPERYYHARFQNGQVTFRVDFTALPSGLYSEITHLNVVNARPYVRWAVVD ...
Full Text Available YP_007109738.1 1117:15352 1150:9068 63132:1761 1173025:1761 hypothetical protein GEI7407_2207 Geit...DTAAIAGRAVFLEAKQDFVEFAAPLKPQYASCYGYVVSSDEPQYNLWFYKGYVYFRFDLQSLPGRPLSEITSQAIIEDRPFMRWAIAD ...
Full Text Available YP_007110830.1 1117:12257 1150:7214 63132:1987 1173025:1987 hypothetical protein GEI7407_3311 Geit...TAKGYVLWVLEPDAYLDGAEAIAPQAETPAQPTQTSTPCKILDSKSQYRTCHIRVPDLQQRLAALWVDGKYYALFKVVPTVDKAMEITARFGRRGDETVIAKTKKGYSVWVLEPEAYPAPTP ...
Full Text Available YP_007109292.1 1117:7227 1150:63 63132:1506 1173025:1506 hypothetical protein GEI7407_1753 Geit...IGLIALLLLGVTVFTQSQQQLLTRVTAERDRITLVQDDGGDLSTVLRQTSGEVVYRVTLSDAPVGQKLSLSCNWMDPNGQIVHQNRYQTKEITTPVWNTICRHTIGSAAPVGTWKVQMLLGDRLLSDTTFVVK ...
Full Text Available YP_007111118.1 1117:4619 1150:1758 63132:2521 1173025:2521 hypothetical protein GEI7407_3599 Geit...VTLADSLKRQRPAILFFYLDDSRDSKQYASVVSQLQAFYGRAADFLPLSIDTLPLEGSPDLKDPAHYYKGFVPQTVIIDQSGKVVFDQSGVLALESVDDVLRKVFDLLPRSESVELKRRPLNEINIEITSEPQ ...
Full Text Available WP_006910818.1 NZ_DS990557 1117:3961 ... 1118:8120 1301283:24412 ... 167375:439 180281:1513 ... DNA internalizat...ion competence protein, ComEC/Rec2 family protein Cyanobium sp. PCC 7001 MLLGAVLPLG
Full Text Available WP_006910587.1 NZ_DS990557 1117:3961 ... 1118:8120 1301283:24412 ... 167375:439 180281:1345 ... DNA internalizat...ion competence protein, ComEC/Rec2 family protein Cyanobium sp. PCC 7001 MWGALVLLVL
Full Text Available OC100796589 Glycine max MEETSWEQRVQALTHILTSPTTTPSLHSQFFIATQIPCYLNWDYPPFLCSSNPQLLK...803:12853 ... 3814:12853 ... 163735:5410 ... 3846:8024 1462606:8024 3847:8024 ... PREDICTED: uncharacterized protein L
Full Text Available WP_026798282.1 ... 1117:5648 ... 1150:51942 1301283:73102 ... 54304:1817 54307:1346 ... hypothetical protein Plankt...othrix prolifica MNKTKLKFSTELRKLTTVQNPEALRAYCQSLKSQLVADPSNYAKGRYRLWLFHEVDFRDGTLSKGY
Full Text Available WP_026785726.1 ... 1117:3298 ... 1150:52043 1301283:73215 ... 54304:1908 59512:262 ... hypothetical protein Plankt...othrix rubescens MPTFFPEGKTININGEVELLAFQCQDTVVQLAAIAPGAIFPLHQHTESQIGMIFNGNLEMNLNGNKT
Full Text Available WP_026787603.1 ... 1117:6086 ... 1150:52096 1301283:73273 ... 54304:1956 59512:755 ... hypothetical protein Plankt...othrix rubescens MKNAMLEAADIKILEAAAAEDLARDRQFILEEDSNKTLAQQSYKAQQRDQRLVKAALIPRTGEAASP
Full Text Available WP_026798303.1 ... 1117:6692 ... 1150:51953 1301283:73114 ... 54304:1827 54307:1360 ... hypothetical protein Plankt...othrix prolifica MKRWKILSFQIILAALESCFLPAYSDLITNPAYINKMCQRQQDLPQIERFTVFYQQEFSSQNKTYW
Full Text Available WP_026796565.1 ... 1117:6622 ... 1150:52114 1301283:73294 ... 54304:1972 54307:411 ... hypothetical protein Plankt...othrix prolifica MFPLKSYLISKRISSQAFLISVLAVFTVILTVTLDSVSLAMTHPDAARNQTVYGQELIAQSRIPTSDQPSPSSLSDIPTADTASLFQNNRYAVRVFRQENKAYVNIYDKENKTLTLNNE
Full Text Available WP_027254883.1 ... 1117:2965 ... 1150:52982 1301283:74257 ... 54304:633 1160:1456 ... hypothetical protein Plankt...othrix agardhii MKTSLLILCQNRLKQKSLLQHQKTSGFTMIELLIGMIMAAVIITPILAFVVDVLQSDRKEGVKAATDQELEAATDFIKRDLSQAIYIYNKT
Full Text Available WP_026786488.1 ... 1117:44775 ... 1150:52433 1301283:73648 ... 54304:2259 59512:501 ... hypothetical protein Plankt...othrix rubescens MKIIQGYNPSKTISPMKIRKVKGVTIVEKYGDNLYVLPDENNNKTVPEFNKTDSFDINNWAEQATDLDGFYFINAITMTGNYLGSEWNDIILGLKFRGLATYISNH
Full Text Available WP_026788598.1 ... 1117:7257 ... 1150:51123 1301283:72193 ... 54304:108 59512:73 ... hypothetical protein Plankt...othrix rubescens MNSVEELARLQKRFQEAAKVIDDLSRIKQELDQLSKSYKDKLSNNSFELSQTKQEIESLSINHKEYKKYWHETFNAIHNKT
Full Text Available WP_027255690.1 ... 1117:21960 ... 1150:53273 1301283:74581 ... 54304:896 1160:1705 ... hypothetical protein Plankt...othrix agardhii MIPNKTQFLSELQVDSELDLELSTDPNQSIRKFVEHKQVIKFLSEQLSEIEPDAIVEALAIHQDNMNN
Full Text Available WP_026796708.1 ... 1117:53359 ... 1150:52148 1301283:73331 ... 54304:2001 54307:507 ... hypothetical protein Plankt...othrix prolifica MKTTFSWLSSYFLLTGLAISGITFLGEVRPASACTGFWGRMDPTCDHGGITNPVHMTTQDFKICNKTENSISFTLNGSLEAPLRVGYCRTYTNVILPGNVAFDASYADGYQESSYGLDDEKNYSFKLNNQGSGIDLFAD
Full Text Available WP_026796694.1 ... 1117:5991 ... 1150:52142 1301283:73325 ... 54304:1998 54307:498 ... hypothetical protein Plankt...othrix prolifica MLKITLTPEQEQFLQAQLKTGKYNNPQEVISKAFKLLEKENKTELLANIPASASAKKILTEKIKEFRDNLENTQNQPLNPEREKLSREVKELFDKTQSIPGIGDITEEEIAAEIEA
Full Text Available WP_027254510.1 ... 1117:7257 ... 1150:51123 1301283:72193 ... 54304:108 1160:168 ... hypothetical protein Plankt...othrix agardhii MNSVEELARLQKRFQEAAKVIDDLSRIKQELDQLSKSYKDKLSNNSFELSQTKQEIDSLSINHKEYKKYWHETFNAIHNKTENILTQISQIENKT
Full Text Available WP_027250437.1 ... 1117:53495 ... 1150:52763 1301283:74014 ... 54304:436 1160:1265 ... hypothetical protein Plankt...othrix agardhii MIVMPPPPPAIVSQVPHQAIFRDDFSRGCPGYSQAENQQIGNTAANHLAGITKNKTDSLVIFFTREFT
Full Text Available WP_027254794.1 ... 1117:7880 ... 1150:53167 1301283:74463 ... 54304:80 1160:53 ... hypothetical protein Plankt...othrix agardhii MEFEQALEVVNNAIAPKIARTLTEVEVALLFGAWNNLTYDRIAERSGYSINYLQRDIGPKFWKFLSEALGRKVNKT
Full Text Available WP_026796884.1 ... 1117:41752 ... 1150:52182 1301283:73369 ... 54304:2032 54307:622 ... hypothetical protein Plankt...othrix prolifica MNTNDEDQSISNIKRKLLEQINTLKCEDERMYNILAIDVWALAKTMDEFQPGFWGAFMKNREKALKRFLAESAKNKTDTDSKRPPFLR
Full Text Available WP_026798668.1 ... 1117:7321 ... 1150:52735 1301283:73983 ... 54304:410 54307:1082 ... hypothetical protein Plankt...othrix prolifica MHGGFYCTDETTQATYIQLHTSQGLEVLFFDSFIDSHFISFLEREHTDVKFARVDAELDDNLIAKDNSPEIVDPKTNKT
Full Text Available WP_027255601.1 ... 1117:6743 ... 1150:53326 1301283:74640 ... 54304:943 1160:195 ... hypothetical protein Plankt...othrix agardhii MVQILNKTLSLEDFLNLPETKPANEYINGQIIQKPMPQGKHSKLQGKLVTVINNMAEEQAIALALPELRC
Full Text Available WP_026796217.1 ... 1117:3298 ... 1150:52043 1301283:73215 ... 54304:1908 54307:20 ... hypothetical protein Plankt...othrix prolifica MPTFFPEGKTININGEVELLAFQCQDTVVQLAAIAPGAIFPLHQHTESQIGMIFNGNLEMNLNGNKTV
Full Text Available WP_027249778.1 ... 1117:3511 ... 1150:51681 1301283:72812 ... 54304:1582 1160:909 ... hypothetical protein Plankt...othrix agardhii MATSLKKLLIGTSVAVGISAVGITPALAGSLTNATIGGTASTDYLIYGKEGNKTVVIPNSVANLQSVLD...ATKWFGETLSKYGMTSSQTLFSNFLLAGGFQRFSDPNISYVNQDNKTGKITIGLAGHYDAASLLGLPSNPNNPIPNPNNPI
Full Text Available WP_026796606.1 ... 1117:6743 ... 1150:53326 1301283:74640 ... 54304:943 54307:170 ... hypothetical protein Plankt...othrix prolifica MVQILNKTLSLEDFLNLPETKPANEYINGQIIQKPMPQGKHSKLQGKLVTVINNMAEEQAIALALPEL
Full Text Available WP_026787573.1 ... 1117:6169 ... 1150:52274 1301283:73471 ... 54304:2115 59512:749 ... hypothetical protein Plankt...othrix rubescens MGNVSFASENKTLAQSSNISGWVDSFGFASTKQGAGQAGIDQGEKLGILFDGNFDNVINSLKANQLK
Full Text Available WP_026786329.1 ... 1117:7618 ... 1150:52216 1301283:73407 ... 54304:2063 59512:458 ... hypothetical protein Plankt...othrix rubescens MIPFQDSRLLLRALTYRSYMFENPNKTQGDNEQLEFLGDSVLQFLAGDYVYEKYFGEQEGQLTQKRE
Full Text Available WP_027249923.1 ... 1117:7063 ... 1150:51701 1301283:72835 ... 54304:160 1160:1002 ... chem...otaxis protein MotB Planktothrix agardhii MSDLSELELETELQEEQDSGVYLSIGDLMSGLLMFFALLFITVMVQLNKTQDIIKRIPDEMFTTMQ
Full Text Available s mume MSKSKGRFQLPSSSSVISCMFSLSKQKQEESLDDDTQTQKDEWKKTLIGKLGNVHQREYLSHYGRGRKGKIRGE...403 3745:2403 ... 171637:3833 721805:8 ... 3754:8 ... 102107:3050 ... PREDICTED: UPF0481 protein At3g47200-like Prunu
Full Text Available YP_007137393.1 NC_019751 1117:18359 ... 1161:10603 ... 1185:1278 1186:1626 32054:1873 1170562:1873 ... DNA alkyla...tion repair protein Calothrix sp. PCC 6303 MAQYLITQLQEQLAQADDSKTKEWWEAYLKHSLPFRGLKL
Full Text Available ylation repair protein Cyanobium gracile PCC 6307 MGMDRASMTPRAPTIPSAPSSIQKGTPLKHLLG... YP_007046027.1 NC_019675 1117:3386 ... 1118:18369 1301283:9302 ... 167375:2211 59930:1253 292564:1253 ... DNA alk
Full Text Available YP_007137177.1 NC_019751 1117:18359 ... 1161:10603 ... 1185:1278 1186:1626 32054:1873 1170562:1873 ... DNA alkyla...tion repair protein Calothrix sp. PCC 6303 MAQYLITQLQEQLAQAGDSKTKEWWEAYLKHSLPFRGLKL
Full Text Available WP_023064372.1 ... 1117:4910 ... 1150:25585 1301283:43816 ... 28073:3666 118322:924 ... PEP-CTERM sorting , cyanoba...cterial subclass domain protein Lyngbya aestuarii MQNATIDGVYFVPEPFTILGTGTALGFGVLFKKESSKKRKKEKAKV
Full Text Available s tolerance to multiple environmental stresses and reduces photooxidative damage ... 41938:10941 3629:10941 ... 214909:10941 ... 3640:10941 ... 3641:10941 ... Encodes a chloroplast protein that induce
Full Text Available 5:248 3803:248 ... 3814:248 ... 163722:3689 ... 3826:3689 ... 3827:3689 ... PREDICTED: uncharacterized protein LOC101515541 Cicer arietinum MNSSTI...CSLFLGLILISQSANAKGHGGGLVVTICKGATDRAACENILGSNSEISHAKSFSQL
Full Text Available WP_023067516.1 ... 1117:6681 ... 1150:23983 1301283:42036 ... 28073:2223 118322:2946 ... molybdenum Cofactor Synthe...sis C family protein Lyngbya aestuarii MPEGAEMDYILQQNLLTNDELLTLLREVFIPVGFTRFRLTGGEP
Full Text Available YP_007154859.1 NC_019775 1117:3991 ... 1161:5387 ... 1162:6232 1163:4927 1165:626 272123:626 ... putative addicti...on module antidote protein Anabaena cylindrica PCC 7122 MALTKDFKETVNARIQRDPDFAIVLLDEAISLFLNGELETARLILRNMLNLSHF
Full Text Available 0727 3803:10727 ... 3814:10727 ... 163735:2506 ... 3883:1736 ... 3885:1736 ... hypothetical protein PHAVU_009G116600g Phaseolus vulgaris MKKNRMMIM...ICSVGVVWMLLVGGSYGEQCGRQAGGALCPGGNCCSQFGWCGSTTDYCGKDCQSQC
Full Text Available oration protein HypA Coleofasciculus chthonoplastes MHETDMTKALILTVRQWWEEQPSRPQIDTIH... WP_006104069.1 ... 1117:21745 ... 1150:63074 1301283:85471 ... 669368:4227 64178:4227 ... hydrogenase nickel incorp
Full Text Available Hydrogenase-3 nickel incorporation protein HypA Microcoleus sp. PCC 7113 MHEVGIMQNTLDIALEYANRQGAAQIHRMTLRIGQ... WP_015182975.1 ... 1117:21745 ... 1150:36937 1301283:56429 ... 44471:3321 1173027:436 ...
Full Text Available WP_026102361.1 NZ_KB235920 1117:21745 ... 52604:1915 ... 44474:816 118161:1544 ... hydrogenase nickel incorpora...tion protein HypA Pleurocapsa sp. PCC 7319 MHETDMTKALILTMKDWYVSQPEQRKIEKVHLIVGQFTCV
Full Text Available poration protein HypA Geitlerinema sp. PCC 7407 MTKALLMTLHDWWESQPDRPKIDKVHLVVGQFTCV... WP_015172367.1 ... 1117:21745 ... 1150:57512 1301283:79291 ... 63132:2365 1173025:1620 ... hydrogenase nickel incor
Full Text Available ... hydrogenase nickel incorporation protein HypA Nodosilinea nodulosa MHETDMTKALILTVRDWWEAQPGQPAIEKVHLTVGKFTC... WP_017299262.1 ... 1117:21745 ... 1150:31644 1301283:50549 ... 1120752:1645 416001:1645 ...
Full Text Available Hydrogenase-3 nickel incorporation protein HypA Microcoleus sp. PCC 7113 MHETDMTKALIVTVRDWWEAQPEHPKISHIYLTVG... WP_015180155.1 ... 1117:21745 ... 1150:36937 1301283:56429 ... 44471:3321 1173027:436 ...
Full Text Available YP_007110526.1 1117:2337 1150:9795 63132:2205 1173025:2205 hypothetical protein GEI7407_3004 Geit...lerinema sp. PCC 7407 MNKLLTLTVLGCVLSAAPAAIAAEWREITRNDVGDRFMIDTSSLDRRGSSVWFWEYRDFPQPNNAFLEETVD
Full Text Available YP_007108576.1 1117:24712 1150:4580 63132:1053 1173025:1053 LSU ribosomal protein L10P Geit...lerinema sp. PCC 7407 MGRTLEDKKAIVAELKETLSTAQMTFVIDYKGLSVSEITDLRNRLRPAGAVCKVTKNTLMRIAVDGSENWQPLTELLS
Full Text Available YP_007109143.1 1117:4837 1150:2767 63132:154 1173025:154 response regulator receiver protein Geit...lerinema sp. PCC 7407 MTFKILLVEDDFLLARGTAKLLQRLGDHSVEITDDPATIFQRCESGEIDLVMMDVNLPGAQWEGHPVSGADLAQRLKTKSSTAHIPIIIVSAYAMLSERQTLLKVSLADEFCTKPITDYEALLRLIEELVARSPQVERLH ...
Full Text Available YP_007108243.1 1117:25160 1150:6231 63132:809 1173025:809 iojap-like protein Geitle...rinema sp. PCC 7407 MMNFDTSDNRPTTSFEPLPTEAPVPSSDSQEAHELVKTIVEAAEDRKGAEITVLRVSEVSYLADYFVIVTGFSTTQVRAIARSIEAKVEEAWQRQPRKNEGINEGKWVLQDYGDVIVHVFMPQEREFYSLEAFWGHADRLTLPNLELAQEA ...
Full Text Available YP_007109074.1 1117:25729 1150:9238 63132:1257 1173025:1257 hypothetical protein GEI7407_1530 Geit...lerinema sp. PCC 7407 MAQLQRLDIVGDDGQTIEIFVEEKDAPVLATSPNRDGRPSMGAGSPSVKMQQMQQVIRGYATYALNAFKDFSAAEVEEITLMFGVKLSASAGIPYIANGTTDSNLEVQVKCRFPAKDG ...
Full Text Available WP_015170813.1 ... 1117:4710 ... 1150:56664 1301283:78348 ... 63132:1601 1173025:861 ... hypothetical protein Geit...lerinema sp. PCC 7407 MGIKRQIEITPLHCIHPGKGLEICPLDQAATATHTNAEQPTWGHETTLVTLAPGTIEDLFVH
Full Text Available YP_007108482.1 1117:25386 1150:6507 63132:985 1173025:985 Protein of unknown function DUF2605 Geit...lerinema sp. PCC 7407 MFSSDLPEPDLLKTVLLPLLEDFQYWFGRSRSLLESEEITFLSQDQQADLLARVCQAQQEVMAAQALFNATDGQVGVETAALMPWHQLVTECWQVGMRLRTEKSRS ...
Full Text Available WP_015170997.1 ... 1117:21655 ... 1150:57681 1301283:79478 ... 63132:2517 1173025:996 ... hypothetical protein Geit...lerinema sp. PCC 7407 MFSSDLPEPDLLKTVLLPLLEDFQYWFGRSRSLLESEEITFLSQDQQADLLARVCQAQQEVMAAQALFNATDGQVGVETAALMPWHQLVTECWQVGMRLRTEKSRS
Full Text Available YP_007109749.1 1117:35991 1150:9729 63132:1766 1173025:1766 hypothetical protein GEI7407_2218 Geit...lerinema sp. PCC 7407 MAEITDPEGEHRRIHWAAEQTDVATLINAYRGWYHWADAKEWASGAAFLRRLSQVGAGSPEAIALFIEQMNFHAQSRTKSGKYIYSAAKDALKALAEQGDPSAIAAWEEMQSSSEKP ...
Full Text Available YP_007108354.1 1117:2126 1150:6302 63132:896 1173025:896 SSU ribosomal protein S11P Geit...lerinema sp. PCC 7407 MARQTKKTGAKKQKRNVPSGVAHIQSTFNNTIVTITAPNGEVISWASAGSSGFKGAKKGTPFAAQTASESAARRATDQGMRQIEVMVSGPGAGRETAIRALQGAGLEITLIRDVTPIPHNGCRPPKRRRV ...
Full Text Available YP_007109075.1 1117:10916 1150:5016 63132:1375 1173025:1375 hypothetical protein GEI7407_1531 Geit...lerinema sp. PCC 7407 MTQSSDSSTSQAKHQRSFAEWVSFAIAAAIIASVLGLVAYTWATGDTQPPVLETEITPEVRQAGSQFYIPFSVTNTGGGTAESVQVIAELRVNGEVIETGEQQFDFLSGGEKAEGAFVFQRDPAQGDLSLRVASYSLP ...
Full Text Available WP_015170748.1 ... 1117:22225 ... 1150:58729 1301283:80642 ... 63132:3460 1173025:812 ... hypothetical protein Geit...lerinema sp. PCC 7407 MASTYSFDIVSDFDRQELVNAIDQTTREIGTRYDLKDTKTTLELGEDEITVNTDSEFTLTA
Full Text Available WP_015172420.1 ... 1117:22013 ... 1150:56822 1301283:78524 ... 63132:1744 1173025:1864 ... ... hypothetical protein Geitlerinema sp. PCC 7407 MTTANFSHKDVAEITEAEVAALANRLEDDDYSSVFEGLEDWHLLRAIAFQRPELVEPYIHLLDLEAYDEA
Full Text Available YP_007108952.1 1117:2626 1150:6080 63132:1307 1173025:1307 hypothetical protein GEI7407_1406 Geit...lerinema sp. PCC 7407 MFQPSTVLPSSDRPVSHEIRYERSFLLDLKNLEPAVYQRVFQFVFQDKLTLTQIQEMPGFRQIYASPIFYRFELSDCLIGVEITGQIVKFLRVIPKPDI ...
Full Text Available YP_007108232.1 1117:12067 1150:5751 63132:801 1173025:801 protein of unknown function DUF520 Geit...lerinema sp. PCC 7407 MASTYSFDIVSDFDRQELVNAIDQTTREIGTRYDLKDTKTTLELGEDEITVNTDSEFTLTAVHTILQTKAAK
Full Text Available WP_015171589.1 ... 1117:22124 ... 1150:57295 1301283:79049 ... 63132:217 1173025:1268 ... hypothetical protein Geit...lerinema sp. PCC 7407 MAQLQRLDIVGDDGQTIEIFVEEKDAPVLATSPNRDGRPSMGAGSPSVKMQQMQQVIRGYATYALNAFKDFSAAEVEEITLMFGVKLSASAGIPYIANGTTDSNLEVQVKCRFPAKDG
Full Text Available WP_015170759.1 ... 1117:22502 ... 1150:57505 1301283:79283 ... 63132:2359 1173025:820 ... iojap family protein Geit...lerinema sp. PCC 7407 MMNFDTSDNRPTTSFEPLPTEAPVPSSDSQEAHELVKTIVEAAEDRKGAEITVLRVSEVSY
Full Text Available WP_015173038.1 ... 1117:4673 ... 1150:58391 1301283:80267 ... 63132:3156 1173025:2219 ... hypothetical protein Geit...lerinema sp. PCC 7407 MNKLLTLTVLGCVLSAAPAAIAAEWREITRNDVGDRFMIDTSSLDRRGSSVWFWEYRDFPQ
Full Text Available YP_007108297.1 1117:12245 1150:7393 63132:850 1173025:850 hypothetical protein GEI7407_0747 Geit...lerinema sp. PCC 7407 MGIKRQIEITPLHCIHPGKGLEICPLDQAATATHTNAEQPTWGHETTLVTLAPGTIEDLFVHHFQTDQLLVVQ
Full Text Available YP_007107948.1 1117:753 1150:5622 63132:560 1173025:560 hypothetical protein GEI7407_0395 Geit...lerinema sp. PCC 7407 MTGFGAGESGQAPERSGFEPELGGFLRDAAQRSGLEPELGGVLRQRGVYVDEITCIGCKHCAHVARNTFYIEPDY
Full Text Available YP_007109407.1 1117:4756 1150:3024 63132:837 1173025:837 Peptidoglycan-binding domain 1 protein Geit...lerinema sp. PCC 7407 MKLQTIFRLLTIAVGLLGGSFTQPAIAHSLASQQTSPVLIATAYTDLTLPTLRQGDRGRSVELLQNILLDNGFLGAAGVRLGNPQGAIVDGIFGEITAAAIRDLQRRYEIPVTGQVNPTTWEVLDMYENPYRSPLPWKQ ...
Full Text Available WP_015173077.1 ... 1117:22894 ... 1150:58400 1301283:80278 ... 63132:3164 1173025:2236 ... ... hypothetical protein Geitlerinema sp. PCC 7407 MAKPWQKVLLAVALVGGVWGVSPAIAGTCASNCGPKPLQFIPGQQVKLQIINRTASIIEIQKVYGTDPVALRPGQEIT
Full Text Available WP_015170419.1 ... 1117:23009 ... 1150:58655 1301283:80560 ... 63132:3394 1173025:527 ... hypothetical protein Geit...lerinema sp. PCC 7407 MAASDDFKQQIRDGNLSDALKLALSEAIHLEITTWVSSPEQGDRAAMPGSRMRTRINVVDG
Full Text Available YP_007110665.1 1117:24697 1150:4821 63132:2039 1173025:2039 photosystem II reaction... center protein Psb28 Geitlerinema sp. PCC 7407 MAQIQFSPGVSEAVIPDVRLTRARDGSSGTATFYFERPQALVGESNAEITGMYLVDEEGQVMTREVKAKFINGQPEALEATYIMRSSEEWDRFMRFMERYAEEHGLGFSKS ...
Full Text Available WP_015170464.1 ... 1117:946 ... 1150:58664 1301283:80570 ... 63132:3401 1173025:562 ... hypothetical protein Geit...lerinema sp. PCC 7407 MTGFGAGESGQAPERSGFEPELGGFLRDAAQRSGLEPELGGVLRQRGVYVDEITCIGCKHCAH
Full Text Available WP_015171590.1 ... 1117:23095 ... 1150:57940 1301283:79766 ... 63132:2750 1173025:1386 ... ... hypothetical protein Geitlerinema sp. PCC 7407 MTQSSDSSTSQAKHQRSFAEWVSFAIAAAIIASVLGLVAYTWATGDTQPPVLETEITPEV
Full Text Available YP_007110171.1 1117:85 1150:6948 63132:2007 1173025:2007 hypothetical protein GEI7407_2646 Geit...lerinema sp. PCC 7407 MDPLFWLGLSILLVAVSLTALLFVAIPAFQELGRAARSAEKLFDTLNRELPPTLESIRLTGLEITELTEDVSDG
Full Text Available YP_007109341.1 1117:3927 1150:1735 63132:1535 1173025:1535 hypothetical protein GEI7407_1805 Geit...lerinema sp. PCC 7407 MKIETRRFLLRDFIPADDAAFLAYHVEPRFAEFCSPAEITPSFNRNLLQQFNQWANEYPRHNYQLAIVSRRD
Full Text Available lasmid addiction system poison protein Anabaena sp. 90 MLINLNENINYTVVIGIDAQDFFESASATLQKKLDRCFEILKIEPRNYPNIKALKGEFSGYYRYRVGDYRVIYEIDDNSKLVTILLIAHRSKVYE ... YP_006996887.1 NC_019440 1117:5824 ... 1161:889 ... 1162:2120 1163:3278 46234:816 ... p
Full Text Available 58024:13655 3398:13655 71240:8416 91827:8416 71275:10680 91836:5614 3699:5614 3700:5614 980083:5614 3701:5614 3702:5735 post-illumin...ation chlorophyll fluorescence increase protein Arabidopsis thaliana MAAAANTSAVFASP
Full Text Available 655 58024:13655 3398:13655 71240:8416 91827:8416 71275:10680 91836:5614 3699:5614 3700:5614 980083:5614 3701:5614 3702:5735 post-illu...mination chlorophyll fluorescence increase protein Arabidopsis thaliana MNDTVYSSRIG
Full Text Available 655 58024:13655 3398:13655 71240:8416 91827:8416 71275:10680 91836:5614 3699:5614 3700:5614 980083:5614 3701:5614 3702:5735 post-illu...mination chlorophyll fluorescence increase protein Arabidopsis thaliana MIDYFDRYKLP
Full Text Available 655 58024:13655 3398:13655 71240:8416 91827:8416 71275:10680 91836:5614 3699:5614 3700:5614 980083:5614 3701:5614 3702:5735 post-illu...mination chlorophyll fluorescence increase protein Arabidopsis thaliana MAAAANTSAVF
Full Text Available 655 58024:13655 3398:13655 71240:8416 91827:8416 71275:10680 91836:5614 3699:5614 3700:5614 980083:5614 3701:5614 3702:5735 post-illu...mination chlorophyll fluorescence increase protein Arabidopsis thaliana MAAAANTSAVF
Full Text Available 2612 1183438:2612 ... DNA internalization-related competence protein ComEC/Rec2 Gloeobacter kilaueensis JS1 ME... YP_008713728.1 NC_022600 1117:3961 ... 307596:1867 307595:1867 ... 33071:1867 1416614:
Full Text Available ivery (BCD) family protein Synechococcus sp. JA-2-3B'a(2-13) MANIRRDPEVNASVASVTPASP... YP_478949.1 NC_007776 1117:6436 ... 1118:902 1301283:25411 ... 1129:1600 321332:1945 ... bacteriochlorophyll del
Chaudhuri, Rima; Sadrieh, Arash; Hoffman, Nolan J; Parker, Benjamin L; Humphrey, Sean J; Stöckli, Jacqueline; Hill, Adam P; James, David E; Yang, Jean Yee Hwa
Most biological processes are influenced by protein post-translational modifications (PTMs). Identifying novel PTM sites in different organisms, including humans and model organisms, has expedited our understanding of key signal transduction mechanisms. However, with increasing availability of deep, quantitative datasets in diverse species, there is a growing need for tools to facilitate cross-species comparison of PTM data. This is particularly important because functionally important modification sites are more likely to be evolutionarily conserved; yet cross-species comparison of PTMs is difficult since they often lie in structurally disordered protein domains. Current tools that address this can only map known PTMs between species based on known orthologous phosphosites, and do not enable the cross-species mapping of newly identified modification sites. Here, we addressed this by developing a web-based software tool, PhosphOrtholog ( www.phosphortholog.com ) that accurately maps protein modification sites between different species. This facilitates the comparison of datasets derived from multiple species, and should be a valuable tool for the proteomics community. Here we describe PhosphOrtholog, a web-based application for mapping known and novel orthologous PTM sites from experimental data obtained from different species. PhosphOrtholog is the only generic and automated tool that enables cross-species comparison of large-scale PTM datasets without relying on existing PTM databases. This is achieved through pairwise sequence alignment of orthologous protein residues. To demonstrate its utility we apply it to two sets of human and rat muscle phosphoproteomes generated following insulin and exercise stimulation, respectively, and one publicly available mouse phosphoproteome following cellular stress revealing high mapping and coverage efficiency. Although coverage statistics are dataset dependent, PhosphOrtholog increased the number of cross-species mapped sites
Knowles Donald P
Full Text Available Abstract Background Rhipicephalus (Boophilus microplus is an economically important tick of cattle involved in the transmission of Babesia bovis, the etiological agent of bovine babesiosis. Commercial anti-tick vaccines based on the R. microplus Bm86 glycoprotein have shown some effect in controlling tick infestation; however their efficacy as a stand-alone solution for tick control has been questioned. Understanding the role of the Bm86 gene product in tick biology is critical to identifying additional methods to utilize Bm86 to reduce R. microplus infestation and babesia transmission. Additionally, the role played by Bm86 in R. microplus fitness during B. bovis infection is unknown. Results Here we describe in two independent experiments that RNA interference-mediated silencing of Bm86 decreased the fitness of R. microplus females fed on cattle during acute B. bovis infection. Notably, Bm86 silencing decreased the number and survival of engorged females, and decreased the weight of egg masses. However, gene silencing had no significant effect on the efficiency of transovarial transmission of B. bovis from surviving female ticks to their larval offspring. The results also show that Bm86 is expressed, in addition to gut cells, in larvae, nymphs, adult males and ovaries of partially engorged adult R. microplus females, and its expression was significantly down-regulated in ovaries of ticks fed on B. bovis-infected cattle. Conclusion The R. microplus Bm86 gene plays a critical role during tick feeding and after repletion during blood digestion in ticks fed on cattle during acute B. bovis infection. Therefore, the data indirectly support the rationale for using Bm86-based vaccines, perhaps in combination with acaricides, to control tick infestation particularly in B. bovis endemic areas.
Full Text Available ut This Database Database Description Download License Update History of This Database Site Policy | Contact Us Protein (Cyanobacteria) - PGDBj - Ortholog DB | LSDB Archive ... ...List Contact us PGDBj - Ortholog DB Protein (Cyanobacteria) Data detail Data name Protein (Cyanobacteria) DO...switchLanguage; BLAST Search Image Search Home About Archive Update History Data
Full Text Available ase Description Download License Update History of This Database Site Policy | Contact Us Protein (Viridiplantae) - PGDBj - Ortholog DB | LSDB Archive ... ...List Contact us PGDBj - Ortholog DB Protein (Viridiplantae) Data detail Data name Protein (Viridiplantae) DO...switchLanguage; BLAST Search Image Search Home About Archive Update History Data
Trachana, Kalliopi; Larsson, Tomas A; Powell, Sean; Chen, Wei-Hua; Doerks, Tobias; Muller, Jean; Bork, Peer
The increasing number of sequenced genomes has prompted the development of several automated orthology prediction methods. Tests to evaluate the accuracy of predictions and to explore biases caused by biological and technical factors are therefore required. We used 70 manually curated families to analyze the performance of five public methods in Metazoa. We analyzed the strengths and weaknesses of the methods and quantified the impact of biological and technical challenges. From the latter part of the analysis, genome annotation emerged as the largest single influencer, affecting up to 30% of the performance. Generally, most methods did well in assigning orthologous group but they failed to assign the exact number of genes for half of the groups. The publicly available benchmark set (http://eggnog.embl.de/orthobench/) should facilitate the improvement of current orthology assignment protocols, which is of utmost importance for many fields of biology and should be tackled by a broad scientific community. Copyright © 2011 WILEY Periodicals, Inc.
Sven Heinicke; Michael S Livstone; Charles Lu; Rose Oughtred; Fan Kang; Samuel V Angiuoli; Owen White; David Botstein; Kara Dolinski
Many biological databases that provide comparative genomics information and tools are now available on the internet. While certainly quite useful, to our knowledge none of the existing databases combine results from multiple comparative genomics methods with manually curated information from the literature. Here we describe the Princeton Protein Orthology Database (P-POD, http://ortholog.princeton.edu), a user-friendly database system that allows users to find and visualize the phylogenetic r...
Full Text Available Abstract Background Signal peptide is one of the most important motifs involved in protein trafficking and it ultimately influences protein function. Considering the expected functional conservation among orthologs it was hypothesized that divergence in signal peptides within orthologous groups is mainly due to N-terminal protein sequence misannotation. Thus, discrepancies in signal peptide prediction of orthologous proteins were used to identify misannotated proteins in five Plasmodium species. Methods Signal peptide (SignalP and orthology (OrthoMCL were combined in an innovative strategy to identify orthologous groups showing discrepancies in signal peptide prediction among their protein members (Mixed groups. In a comparative analysis, multiple alignments for each of these groups and gene models were visually inspected in search of misannotated proteins and, whenever possible, alternative gene models were proposed. Thresholds for signal peptide prediction parameters were also modified to reduce their impact as a possible source of discrepancy among orthologs. Validation of new gene models was based on RT-PCR (few examples or on experimental evidence already published (ApiLoc. Results The rate of misannotated proteins was significantly higher in Mixed groups than in Positive or Negative groups, corroborating the proposed hypothesis. A total of 478 proteins were reannotated and change of signal peptide prediction from negative to positive was the most common. Reannotations triggered the conversion of almost 50% of all Mixed groups, which were further reduced by optimization of signal peptide prediction parameters. Conclusions The methodological novelty proposed here combining orthology and signal peptide prediction proved to be an effective strategy for the identification of proteins showing wrongly N-terminal annotated sequences, and it might have an important impact in the available data for genome-wide searching of potential vaccine and drug
Medvedeva, Irina V; Demenkov, Pavel S; Ivanisenko, Vladimir A
Functional sites define the diversity of protein functions and are the central object of research of the structural and functional organization of proteins. The mechanisms underlying protein functional sites emergence and their variability during evolution are distinguished by duplication, shuffling, insertion and deletion of the exons in genes. The study of the correlation between a site structure and exon structure serves as the basis for the in-depth understanding of sites organization. In this regard, the development of programming resources that allow the realization of the mutual projection of exon structure of genes and primary and tertiary structures of encoded proteins is still the actual problem. Previously, we developed the SitEx system that provides information about protein and gene sequences with mapped exon borders and protein functional sites amino acid positions. The database included information on proteins with known 3D structure. However, data with respect to orthologs was not available. Therefore, we added the projection of sites positions to the exon structures of orthologs in SitEx 2.0. We implemented a search through database using site conservation variability and site discontinuity through exon structure. Inclusion of the information on orthologs allowed to expand the possibilities of SitEx usage for solving problems regarding the analysis of the structural and functional organization of proteins. Database URL: http://www-bionet.sscc.ru/sitex/ .
Nicholas J Marini
Full Text Available Computational predictions of the functional impact of genetic variation play a critical role in human genetics research. For nonsynonymous coding variants, most prediction algorithms make use of patterns of amino acid substitutions observed among homologous proteins at a given site. In particular, substitutions observed in orthologous proteins from other species are often assumed to be tolerated in the human protein as well. We examined this assumption by evaluating a panel of nonsynonymous mutants of a prototypical human enzyme, methylenetetrahydrofolate reductase (MTHFR, in a yeast cell-based functional assay. As expected, substitutions in human MTHFR at sites that are well-conserved across distant orthologs result in an impaired enzyme, while substitutions present in recently diverged sequences (including a 9-site mutant that "resurrects" the human-macaque ancestor result in a functional enzyme. We also interrogated 30 sites with varying degrees of conservation by creating substitutions in the human enzyme that are accepted in at least one ortholog of MTHFR. Quite surprisingly, most of these substitutions were deleterious to the human enzyme. The results suggest that selective constraints vary between phylogenetic lineages such that inclusion of distant orthologs to infer selective pressures on the human enzyme may be misleading. We propose that homologous proteins are best used to reconstruct ancestral sequences and infer amino acid conservation among only direct lineal ancestors of a particular protein. We show that such an "ancestral site preservation" measure outperforms other prediction methods, not only in our selected set for MTHFR, but also in an exhaustive set of E. coli LacI mutants.
Full Text Available Proteins of the same functional family (for example, kinases may have significantly different lengths. It is an open question whether such variation in length is random or it appears as a response to some unknown evolutionary driving factors. The main purpose of this paper is to demonstrate existence of factors affecting prokaryotic gene lengths. We believe that the ranking of genomes according to lengths of their genes, followed by the calculation of coefficients of association between genome rank and genome property, is a reasonable approach in revealing such evolutionary driving factors. As we demonstrated earlier, our chosen approach, Bubble-sort, combines stability, accuracy, and computational efficiency as compared to other ranking methods. Application of Bubble Sort to the set of 1390 prokaryotic genomes confirmed that genes of Archaeal species are generally shorter than Bacterial ones. We observed that gene lengths are affected by various factors: within each domain, different phyla have preferences for short or long genes; thermophiles tend to have shorter genes than the soil-dwellers; halophiles tend to have longer genes. We also found that species with overrepresentation of cytosines and guanines in the third position of the codon (GC3 content tend to have longer genes than species with low GC3 content.
Tatarinova, Tatiana; Salih, Bilal; Dien Bard, Jennifer; Cohen, Irit; Bolshoy, Alexander
Proteins of the same functional family (for example, kinases) may have significantly different lengths. It is an open question whether such variation in length is random or it appears as a response to some unknown evolutionary driving factors. The main purpose of this paper is to demonstrate existence of factors affecting prokaryotic gene lengths. We believe that the ranking of genomes according to lengths of their genes, followed by the calculation of coefficients of association between genome rank and genome property, is a reasonable approach in revealing such evolutionary driving factors. As we demonstrated earlier, our chosen approach, Bubble-sort, combines stability, accuracy, and computational efficiency as compared to other ranking methods. Application of Bubble Sort to the set of 1390 prokaryotic genomes confirmed that genes of Archaeal species are generally shorter than Bacterial ones. We observed that gene lengths are affected by various factors: within each domain, different phyla have preferences for short or long genes; thermophiles tend to have shorter genes than the soil-dwellers; halophiles tend to have longer genes. We also found that species with overrepresentation of cytosines and guanines in the third position of the codon (GC3 content) tend to have longer genes than species with low GC3 content.
Full Text Available One of the fundamental goals of genetics is to understand gene functions and their associated phenotypes. To achieve this goal, in this study we developed a computational algorithm that uses orthology and protein-protein interaction information to infer gene-phenotype associations for multiple species. Furthermore, we developed a web server that provides genome-wide phenotype inference for six species: fly, human, mouse, worm, yeast, and zebrafish. We evaluated our inference method by comparing the inferred results with known gene-phenotype associations. The high Area Under the Curve values suggest a significant performance of our method. By applying our method to two human representative diseases, Type 2 Diabetes and Breast Cancer, we demonstrated that our method is able to identify related Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways. The web server can be used to infer functions and putative phenotypes of a gene along with the candidate genes of a phenotype, and thus aids in disease candidate gene discovery. Our web server is available at http://jjwanglab.org/PhenoPPIOrth.
Full Text Available BACKGROUND: The Usher syndrome (USH is the most frequent deaf-blindness hereditary disease in humans. Deafness is attributed to the disorganization of stereocilia in the inner ear. USH1, the most severe subtype, is associated with mutations in genes encoding myosin VIIa, harmonin, cadherin 23, protocadherin 15, and sans. Myosin VIIa, harmonin, cadherin 23, and protocadherin 15 physically interact in vitro and localize to stereocilia tips in vivo, indicating that they form functional complexes. Sans, in contrast, localizes to vesicle-like structures beneath the apical membrane of stereocilia-displaying hair cells. How mutations in sans result in deafness and blindness is not well understood. Orthologs of myosin VIIa and protocadherin 15 have been identified in Drosophila melanogaster and their genetic analysis has identified essential roles in auditory perception and microvilli morphogenesis, respectively. PRINCIPAL FINDINGS: Here, we have identified and characterized the Drosophila ortholog of human sans. Drosophila Sans is expressed in tubular organs of the embryo, in lens-secreting cone cells of the adult eye, and in microvilli-displaying follicle cells during oogenesis. Sans mutants are viable, fertile, and mutant follicle cells appear to form microvilli, indicating that Sans is dispensable for fly development and microvilli morphogenesis in the follicle epithelium. In follicle cells, Sans protein localizes, similar to its vertebrate ortholog, to intracellular punctate structures, which we have identified as early endosomes associated with the syntaxin Avalanche. CONCLUSIONS: Our work is consistent with an evolutionary conserved function of Sans in vesicle trafficking. Furthermore it provides a significant basis for further understanding of the role of this Usher syndrome ortholog in development and disease.
Full Text Available Abstract Background Large-scale identification of the interrelationships between different components of the cell, such as the interactions between proteins, has recently gained great interest. However, unraveling large-scale protein-protein interaction maps is laborious and expensive. Moreover, assessing the reliability of the interactions can be cumbersome. Results In this study, we have developed a computational method that exploits the existing knowledge on protein-protein interactions in diverse species through orthologous relations on the one hand, and functional association data on the other hand to predict and filter protein-protein interactions in Arabidopsis thaliana. A highly reliable set of protein-protein interactions is predicted through this integrative approach making use of existing protein-protein interaction data from yeast, human, C. elegans and D. melanogaster. Localization, biological process, and co-expression data are used as powerful indicators for protein-protein interactions. The functional repertoire of the identified interactome reveals interactions between proteins functioning in well-conserved as well as plant-specific biological processes. We observe that although common mechanisms (e.g. actin polymerization and components (e.g. ARPs, actin-related proteins exist between different lineages, they are active in specific processes such as growth, cancer metastasis and trichome development in yeast, human and Arabidopsis, respectively. Conclusion We conclude that the integration of orthology with functional association data is adequate to predict protein-protein interactions. Through this approach, a high number of novel protein-protein interactions with diverse biological roles is discovered. Overall, we have predicted a reliable set of protein-protein interactions suitable for further computational as well as experimental analyses.
Huson, Daniel H; Tappu, Rewati; Bazinet, Adam L; Xie, Chao; Cummings, Michael P; Nieselt, Kay; Williams, Rohan
Microbiome sequencing projects typically collect tens of millions of short reads per sample. Depending on the goals of the project, the short reads can either be subjected to direct sequence analysis or be assembled into longer contigs. The assembly of whole genomes from metagenomic sequencing reads is a very difficult problem. However, for some questions, only specific genes of interest need to be assembled. This is then a gene-centric assembly where the goal is to assemble reads into contigs for a family of orthologous genes. We present a new method for performing gene-centric assembly, called protein-alignment-guided assembly, and provide an implementation in our metagenome analysis tool MEGAN. Genes are assembled on the fly, based on the alignment of all reads against a protein reference database such as NCBI-nr. Specifically, the user selects a gene family based on a classification such as KEGG and all reads binned to that gene family are assembled. Using published synthetic community metagenome sequencing reads and a set of 41 gene families, we show that the performance of this approach compares favorably with that of full-featured assemblers and that of a recently published HMM-based gene-centric assembler, both in terms of the number of reference genes detected and of the percentage of reference sequence covered. Protein-alignment-guided assembly of orthologous gene families complements whole-metagenome assembly in a new and very useful way.
J Peter Svensson
Full Text Available Studies in Saccharomyces cerevisiae show that many proteins influence cellular survival upon exposure to DNA damaging agents. We hypothesized that human orthologs of these S. cerevisiae proteins would also be required for cellular survival after treatment with DNA damaging agents. For this purpose, human homologs of S. cerevisiae proteins were identified and mapped onto the human protein-protein interaction network. The resulting human network was highly modular and a series of selection rules were implemented to identify 45 candidates for human toxicity-modulating proteins. The corresponding transcripts were targeted by RNA interference in human cells. The cell lines with depleted target expression were challenged with three DNA damaging agents: the alkylating agents MMS and 4-NQO, and the oxidizing agent t-BuOOH. A comparison of the survival revealed that the majority (74% of proteins conferred either sensitivity or resistance. The identified human toxicity-modulating proteins represent a variety of biological functions: autophagy, chromatin modifications, RNA and protein metabolism, and telomere maintenance. Further studies revealed that MMS-induced autophagy increase the survival of cells treated with DNA damaging agents. In summary, we show that damage recovery proteins in humans can be identified through homology to S. cerevisiae and that many of the same pathways are represented among the toxicity modulators.
Leal, Fernanda Munhoz Dos Anjos; Virginio, Veridiana Gomes; Martello, Carolina Lumertz; Paes, Jéssica Andrade; Borges, Thiago J; Jaeger, Natália; Bonorino, Cristina; Ferreira, Henrique Bunselmeyer
Mycoplasma hyopneumoniae and Mycoplasma flocculare are two genetically close species found in the swine respiratory tract. Despite their similarities, while M. hyopneumoniae is the causative agent of porcine enzootic pneumonia, M. flocculare is a commensal bacterium. Genomic and transcriptional comparative analyses so far failed to explain the difference in pathogenicity between these two species. We then hypothesized that such difference might be, at least in part, explained by amino acid sequence and immunological or functional differences between ortholog surface proteins. In line with that, it was verified that approximately 85% of the ortholog surface proteins from M. hyopneumoniae 7448 and M. flocculare present one or more differential domains. To experimentally assess possible immunological implications of this kind of difference, the extracellular differential domains from one pair of orthologous surface proteins (MHP7448_0612, from M. hyopneumoniae, and MF_00357, from M. flocculare) were expressed in E. coli and used to immunize mice. The recombinant polypeptides (rMHP61267-169 and rMF35767-196, respectively) induced distinct cellular immune responses. While, rMHP61267-169 induced both Th1 and Th2 responses, rMF35767-196 induced just an early pro-inflammatory response. These results indicate that immunological properties determined by differential domains in orthologous surface protein might play a role in pathogenicity, contributing to elicit specific and differential immune responses against each species. Copyright © 2016 Elsevier B.V. All rights reserved.
Full Text Available Abstract Background It has repeatedly been shown that interacting protein families tend to have similar phylogenetic trees. These similarities can be used to predicting the mapping between two families of interacting proteins (i.e. which proteins from one family interact with which members of the other. The correct mapping will be that which maximizes the similarity between the trees. The two families may eventually comprise orthologs and paralogs, if members of the two families are present in more than one organism. This fact can be exploited to restrict the possible mappings, simply by impeding links between proteins of different organisms. We present here an algorithm to predict the mapping between families of interacting proteins which is able to incorporate information regarding orthologues, or any other assignment of proteins to "classes" that may restrict possible mappings. Results For the first time in methods for predicting mappings, we have tested this new approach on a large number of interacting protein domains in order to statistically assess its performance. The method accurately predicts around 80% in the most favourable cases. We also analysed in detail the results of the method for a well defined case of interacting families, the sensor and kinase components of the Ntr-type two-component system, for which up to 98% of the pairings predicted by the method were correct. Conclusion Based on the well established relationship between tree similarity and interactions we developed a method for predicting the mapping between two interacting families using genomic information alone. The program is available through a web interface.
Full Text Available Abstract Background Systemic acquired resistance (SAR is induced in non-inoculated leaves following infection with certain pathogenic strains. SAR is effective against many pathogens. Salicylic acid (SA is a signaling molecule of the SAR pathway. The development of SAR is associated with the induction of pathogenesis related (PR genes. Arabidopsis non-expressor of PR1 (NPR1 is a regulatory gene of the SA signal pathway 123. SAR in soybean was first reported following infection with Colletotrichum trancatum that causes anthracnose disease. We investigated if SAR in soybean is regulated by a pathway, similar to the one characterized in Arabidopsis. Results Pathogenesis-related gene GmPR1 is induced following treatment of soybean plants with the SAR inducer, 2,6-dichloroisonicotinic acid (INA or infection with the oomycete pathogen, Phytophthora sojae. In P. sojae-infected plants, SAR was induced against the bacterial pathogen, Pseudomonas syringae pv. glycinea. Soybean GmNPR1-1 and GmNPR1-2 genes showed high identities to Arabidopsis NPR1. They showed similar expression patterns among the organs, studied in this investigation. GmNPR1-1 and GmNPR1-2 are the only soybean homologues of NPR1and are located in homoeologous regions. In GmNPR1-1 and GmNPR1-2 transformed Arabidopsis npr1-1 mutant plants, SAR markers: (i PR-1 was induced following INA treatment and (ii BGL2 following infection with Pseudomonas syringae pv. tomato (Pst, and SAR was induced following Pst infection. Of the five cysteine residues, Cys82, Cys150, Cys155, Cys160, and Cys216 involved in oligomer-monomer transition in NPR1, Cys216 in GmNPR1-1 and GmNPR1-2 proteins was substituted to Ser and Leu, respectively. Conclusion Complementation analyses in Arabidopsis npr1-1 mutants revealed that homoeologous GmNPR1-1 and GmNPR1-2 genes are orthologous to Arabidopsis NPR1. Therefore, SAR pathway in soybean is most likely regulated by GmNPR1 genes. Substitution of Cys216 residue, essential
Sandhu, Devinder; Tasma, I Made; Frasch, Ryan; Bhattacharyya, Madan K
Systemic acquired resistance (SAR) is induced in non-inoculated leaves following infection with certain pathogenic strains. SAR is effective against many pathogens. Salicylic acid (SA) is a signaling molecule of the SAR pathway. The development of SAR is associated with the induction of pathogenesis related (PR) genes. Arabidopsis non-expressor of PR1 (NPR1) is a regulatory gene of the SA signal pathway 123. SAR in soybean was first reported following infection with Colletotrichum trancatum that causes anthracnose disease. We investigated if SAR in soybean is regulated by a pathway, similar to the one characterized in Arabidopsis. Pathogenesis-related gene GmPR1 is induced following treatment of soybean plants with the SAR inducer, 2,6-dichloroisonicotinic acid (INA) or infection with the oomycete pathogen, Phytophthora sojae. In P. sojae-infected plants, SAR was induced against the bacterial pathogen, Pseudomonas syringae pv. glycinea. Soybean GmNPR1-1 and GmNPR1-2 genes showed high identities to Arabidopsis NPR1. They showed similar expression patterns among the organs, studied in this investigation. GmNPR1-1 and GmNPR1-2 are the only soybean homologues of NPR1and are located in homoeologous regions. In GmNPR1-1 and GmNPR1-2 transformed Arabidopsis npr1-1 mutant plants, SAR markers: (i) PR-1 was induced following INA treatment and (ii) BGL2 following infection with Pseudomonas syringae pv. tomato (Pst), and SAR was induced following Pst infection. Of the five cysteine residues, Cys82, Cys150, Cys155, Cys160, and Cys216 involved in oligomer-monomer transition in NPR1, Cys216 in GmNPR1-1 and GmNPR1-2 proteins was substituted to Ser and Leu, respectively. Complementation analyses in Arabidopsis npr1-1 mutants revealed that homoeologous GmNPR1-1 and GmNPR1-2 genes are orthologous to Arabidopsis NPR1. Therefore, SAR pathway in soybean is most likely regulated by GmNPR1 genes. Substitution of Cys216 residue, essential for oligomer-monomer transition of Arabidopsis NPR1
Finan Christopher L
Full Text Available Abstract Background In Micrococcus luteus growth and resuscitation from starvation-induced dormancy is controlled by the production of a secreted growth factor. This autocrine resuscitation-promoting factor (Rpf is the founder member of a family of proteins found throughout and confined to the actinobacteria (high G + C Gram-positive bacteria. The aim of this work was to search for and characterise a cognate gene family in the firmicutes (low G + C Gram-positive bacteria and obtain information about how they may control bacterial growth and resuscitation. Results In silico analysis of the accessory domains of the Rpf proteins permitted their classification into several subfamilies. The RpfB subfamily is related to a group of firmicute proteins of unknown function, represented by YabE of Bacillus subtilis. The actinobacterial RpfB and firmicute YabE proteins have very similar domain structures and genomic contexts, except that in YabE, the actinobacterial Rpf domain is replaced by another domain, which we have called Sps. Although totally unrelated in both sequence and secondary structure, the Rpf and Sps domains fulfil the same function. We propose that these proteins have undergone "non-orthologous domain displacement", a phenomenon akin to "non-orthologous gene displacement" that has been described previously. Proteins containing the Sps domain are widely distributed throughout the firmicutes and they too fall into a number of distinct subfamilies. Comparative analysis of the accessory domains in the Rpf and Sps proteins, together with their weak similarity to lytic transglycosylases, provide clear evidence that they are muralytic enzymes. Conclusions The results indicate that the firmicute Sps proteins and the actinobacterial Rpf proteins are cognate and that they control bacterial culturability via enzymatic modification of the bacterial cell envelope.
Background As orthologous proteins are expected to retain function more often than other homologs, they are often used for functional annotation transfer between species. However, ortholog identification methods do not take into account changes in domain architecture, which are likely to modify a protein's function. By domain architecture we refer to the sequential arrangement of domains along a protein sequence. To assess the level of domain architecture conservation among orthologs, we carried out a large-scale study of such events between human and 40 other species spanning the entire evolutionary range. We designed a score to measure domain architecture similarity and used it to analyze differences in domain architecture conservation between orthologs and paralogs relative to the conservation of primary sequence. We also statistically characterized the extents of different types of domain swapping events across pairs of orthologs and paralogs. Results The analysis shows that orthologs exhibit greater domain architecture conservation than paralogous homologs, even when differences in average sequence divergence are compensated for, for homologs that have diverged beyond a certain threshold. We interpret this as an indication of a stronger selective pressure on orthologs than paralogs to retain the domain architecture required for the proteins to perform a specific function. In general, orthologs as well as the closest paralogous homologs have very similar domain architectures, even at large evolutionary separation. The most common domain architecture changes observed in both ortholog and paralog pairs involved insertion/deletion of new domains, while domain shuffling and segment duplication/deletion were very infrequent. Conclusions On the whole, our results support the hypothesis that function conservation between orthologs demands higher domain architecture conservation than other types of homologs, relative to primary sequence conservation. This supports the
Bienkowski, Rick S.; Banerjee, Ayan; Rounds, J. Christopher; Rha, Jennifer; Omotade, Omotola F.; Gross, Christina; Morris, Kevin J.; Leung, Sara W.; Pak, ChangHui; Jones, Stephanie K.; Santoro, Michael R.; Warren, Stephen T.; Zheng, James Q.; Bassell, Gary J.; Corbett, Anita H.; Moberg, Kenneth H.
Summary The Drosophila dNab2 protein is an ortholog of human ZC3H14, a poly(A) RNA-binding protein required for intellectual function. dNab2 supports memory and axon projection, but its molecular role in neurons is undefined. Here we present a network of interactions that links dNab2 to cytoplasmic control of neuronal mRNAs in conjunction with and the Fragile-X protein ortholog dFMRP. dNab2 and dfmr1 interact genetically in control of neurodevelopment and olfactory memory and their encoded proteins co-localize in puncta within neuronal processes. dNab2 regulates CaMKII but not futsch mRNA, implying a selective role in control of dFMRP-bound transcripts. Reciprocally, dFMRP and vertebrate FMRP restrict mRNA poly(A)-tail length similar to dNab2/ZC3H14. Parallel studies of murine hippocampal neurons indicate that ZC3H14 is also a cytoplasmic regulator of neuronal mRNAs. In sum these findings suggest that dNab2 represses expression of a subset of dFMRP-target mRNAs, which could underlie brain-specific defects in patients lacking ZC3H14. PMID:28793261
Rick S. Bienkowski
Full Text Available The Drosophila dNab2 protein is an ortholog of human ZC3H14, a poly(A RNA binding protein required for intellectual function. dNab2 supports memory and axon projection, but its molecular role in neurons is undefined. Here, we present a network of interactions that links dNab2 to cytoplasmic control of neuronal mRNAs in conjunction with the fragile X protein ortholog dFMRP. dNab2 and dfmr1 interact genetically in control of neurodevelopment and olfactory memory, and their encoded proteins co-localize in puncta within neuronal processes. dNab2 regulates CaMKII, but not futsch, implying a selective role in control of dFMRP-bound transcripts. Reciprocally, dFMRP and vertebrate FMRP restrict mRNA poly(A tail length, similar to dNab2/ZC3H14. Parallel studies of murine hippocampal neurons indicate that ZC3H14 is also a cytoplasmic regulator of neuronal mRNAs. Altogether, these findings suggest that dNab2 represses expression of a subset of dFMRP-target mRNAs, which could underlie brain-specific defects in patients lacking ZC3H14.
Bitard-Feildel, Tristan; Kemena, Carsten; Greenwood, Jenny M; Bornberg-Bauer, Erich
Background Orthologous protein detection software mostly uses pairwise comparisons of amino-acid sequences to assert whether two proteins are orthologous or not. Accordingly, when the number of sequences for comparison increases, the number of comparisons to compute grows in a quadratic order. A current challenge of bioinformatic research, especially when taking into account the increasing number of sequenced organisms available, is to make this ever-growing number of comparisons computationa...
Natale, D A; Shankavaram, U T; Galperin, M Y; Wolf, Y I; Aravind, L; Koonin, E V
Standard archival sequence databases have not been designed as tools for genome annotation and are far from being optimal for this purpose. We used the database of Clusters of Orthologous Groups of proteins (COGs) to reannotate the genomes of two archaea, Aeropyrum pernix, the first member of the Crenarchaea to be sequenced, and Pyrococcus abyssi. A. pernix and P. abyssi proteins were assigned to COGs using the COGNITOR program; the results were verified on a case-by-case basis and augmented by additional database searches using the PSI-BLAST and TBLASTN programs. Functions were predicted for over 300 proteins from A. pernix, which could not be assigned a function using conventional methods with a conservative sequence similarity threshold, an approximately 50% increase compared to the original annotation. A. pernix shares most of the conserved core of proteins that were previously identified in the Euryarchaeota. Cluster analysis or distance matrix tree construction based on the co-occurrence of genomes in COGs showed that A. pernix forms a distinct group within the archaea, although grouping with the two species of Pyrococci, indicative of similar repertoires of conserved genes, was observed. No indication of a specific relationship between Crenarchaeota and eukaryotes was obtained in these analyses. Several proteins that are conserved in Euryarchaeota and most bacteria are unexpectedly missing in A. pernix, including the entire set of de novo purine biosynthesis enzymes, the GTPase FtsZ (a key component of the bacterial and euryarchaeal cell-division machinery), and the tRNA-specific pseudouridine synthase, previously considered universal. A. pernix is represented in 48 COGs that do not contain any euryarchaeal members. Many of these proteins are TCA cycle and electron transport chain enzymes, reflecting the aerobic lifestyle of A. pernix. Special-purpose databases organized on the basis of phylogenetic analysis and carefully curated with respect to known and
Sivá, Monika; Svoboda, Michal; Veverka, Václav; Trempe, J. F.; Hofmann, K.; Kožíšek, Milan; Hexnerová, Rozálie; Sedlák, František; Belza, Jan; Brynda, Jiří; Šácha, Pavel; Hubálek, Martin; Starková, Jana; Flaisigová, Iva; Konvalinka, Jan; Grantz Šašková, Klára
Roč. 6, Jul 27 (2016), č. článku 30443. ISSN 2045-2322 R&D Projects: GA ČR(CZ) GBP208/12/G016 Institutional support: RVO:61388963 Keywords : human DNA-damage-inducible 2 protein * proteasome * ubiquitin * retroviral protease-like domain Subject RIV: CE - Biochemistry Impact factor: 4.259, year: 2016 http://www.nature.com/articles/srep30443
Jackson, Daniel J; Reim, Laurin; Randow, Clemens; Cerveau, Nicolas; Degnan, Bernard M; Fleck, Claudia
Despite the evolutionary success and ancient heritage of the molluscan shell, little is known about the molecular details of its formation, evolutionary origins, or the interactions between the material properties of the shell and its organic constituents. In contrast to this dearth of information, a growing collection of molluscan shell-forming proteomes and transcriptomes suggest they are comprised of both deeply conserved, and lineage specific elements. Analyses of these sequence data sets have suggested that mechanisms such as exon shuffling, gene co-option, and gene family expansion facilitated the rapid evolution of shell-forming proteomes and supported the diversification of this phylum specific structure. In order to further investigate and test these ideas we have examined the molecular features and spatial expression patterns of two shell-forming genes (Lustrin and ML1A2) and coupled these observations with materials properties measurements of shells from a group of closely related gastropods (abalone). We find that the prominent "GS" domain of Lustrin, a domain believed to confer elastomeric properties to the shell, varies significantly in length between the species we investigated. Furthermore, the spatial expression patterns of Lustrin and ML1A2 also vary significantly between species, suggesting that both protein architecture, and the regulation of spatial gene expression patterns, are important drivers of molluscan shell evolution. Variation in these molecular features might relate to certain materials properties of the shells of these species. These insights reveal an important and underappreciated source of variation within shell-forming proteomes that must contribute to the diversity of molluscan shell phenotypes. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Bitard-Feildel, Tristan; Kemena, Carsten; Greenwood, Jenny M; Bornberg-Bauer, Erich
Orthologous protein detection software mostly uses pairwise comparisons of amino-acid sequences to assert whether two proteins are orthologous or not. Accordingly, when the number of sequences for comparison increases, the number of comparisons to compute grows in a quadratic order. A current challenge of bioinformatic research, especially when taking into account the increasing number of sequenced organisms available, is to make this ever-growing number of comparisons computationally feasible in a reasonable amount of time. We propose to speed up the detection of orthologous proteins by using strings of domains to characterize the proteins. We present two new protein similarity measures, a cosine and a maximal weight matching score based on domain content similarity, and new software, named porthoDom. The qualities of the cosine and the maximal weight matching similarity measures are compared against curated datasets. The measures show that domain content similarities are able to correctly group proteins into their families. Accordingly, the cosine similarity measure is used inside porthoDom, the wrapper developed for proteinortho. porthoDom makes use of domain content similarity measures to group proteins together before searching for orthologs. By using domains instead of amino acid sequences, the reduction of the search space decreases the computational complexity of an all-against-all sequence comparison. We demonstrate that representing and comparing proteins as strings of discrete domains, i.e. as a concatenation of their unique identifiers, allows a drastic simplification of search space. porthoDom has the advantage of speeding up orthology detection while maintaining a degree of accuracy similar to proteinortho. The implementation of porthoDom is released using python and C++ languages and is available under the GNU GPL licence 3 at http://www.bornberglab.org/pages/porthoda .
Yeats, Trevor H.; Huang, Wenlin; Chatterjee, Subhasish
synthases within the large GDSL superfamily. We demonstrate that members of this ancient and conserved family of cutin synthase‐like (CUS) proteins act as polyester synthases with negligible hydrolytic activity. Moreover, solution‐state NMR analysis indicates that CD1 catalyzes the formation of primarily...... of hydroxyacylglycerol precursors, catalyzed by the GDSL‐motif lipase/hydrolase family protein (GDSL) Cutin Deficient 1 (CD1). Here, we present additional biochemical characterization of CD1 and putative orthologs from Arabidopsis thaliana and the moss Physcomitrella patens, which represent a distinct clade of cutin...... linear cutin oligomeric products in vitro. These results reveal a conserved mechanism of cutin polyester synthesis in land plants, and suggest that elaborations of the linear polymer, such as branching or cross‐linking, may require additional, as yet unknown, factors....
Horner David S
Full Text Available Abstract Background Recent discoveries have highlighted the fact that alternative splicing and alternative transcripts are the rule, rather than the exception, in metazoan genes. Since multiple transcript and protein variants expressed by the same gene are, by definition, structurally distinct and need not to be functionally equivalent, the concept of gene orthology should be extended to the transcript level in order to describe evolutionary relationships between structurally similar transcript variants. In other words, the identification of true orthology relationships between gene products now should progress beyond primary sequence and "splicing orthology", consisting in ancestrally shared exon-intron structures, is required to define orthologous isoforms at transcript level. Results As a starting step in this direction, in this work we performed a large scale human- mouse gene comparison with a twofold goal: first, to assess if and to which extent traditional gene annotations such as RefSeq capture genuine splicing orthology; second, to provide a more detailed annotation and quantification of true human-mouse orthologous transcripts defined as transcripts of orthologous genes exhibiting the same splicing patterns. Conclusions We observed an identical exon/intron structure for 32% of human and mouse orthologous genes. This figure increases to 87% using less stringent criteria for gene structure similarity, thus implying that for about 13% of the human RefSeq annotated genes (and about 25% of the corresponding transcripts we could not identify any mouse transcript showing sufficient similarity to be confidently assigned as a splicing ortholog. Our data suggest that current gene and transcript data may still be rather incomplete - with several splicing variants still unknown. The observation that alternative splicing produces large numbers of alternative transcripts and proteins, some of them conserved across species and others truly species
Li, Zhijie; Chakraborty, Sayan; Xu, Guozhou (NCSU)
Does not respond to nucleotides 1 (DORN1) has recently been identified as the first membrane-integral plant ATP receptor, which is required for ATP-induced calcium response, mitogen-activated protein kinase activation and defense responses in
Carra, Serena; Boncoraglio, Alessandra; Kanon, Bart; Brunsting, Jeanette F.; Minoia, Melania; Rana, Anil; Vos, Michel J.; Seidel, Kay; Sibon, Ody C. M.; Kampinga, Harm H.
Protein aggregation is a hallmark of many neuronal disorders, including the polyglutamine disorder spinocerebellar ataxia 3 and peripheral neuropathies associated with the K141E and K141N mutations in the small heat shock protein HSPB8. In cells, HSPB8 cooperates with BAG3 to stimulate autophagy in
Lee, Youngdeuk; Elvitigala, Don Anushka Sandaruwan; Whang, Ilson; Lee, Sukkyoung; Kim, Hyowon; Zoysa, Mahanama De; Oh, Chulhong; Kang, Do-Hyung; Lee, Jehee
Immune signaling cascades have an indispensable role in the host defense of almost all the organisms. Tumor necrosis factor (TNF) signaling is considered as a prominent signaling pathway in vertebrate as well as invertebrate species. Within the signaling cascade, TNF receptor-associated factor (TRAF) and TNF receptor-associated protein (TTRAP) has been shown to have a crucial role in the modulation of immune signaling in animals. Here, we attempted to characterize a novel molluskan ortholog of TTRAP (AbTTRAP) from disk abalone (Haliotis discus discus) and analyzed its expression levels under pathogenic stress. The complete coding sequence of AbTTRAP consisted of 1071 nucleotides, coding for a 357 amino acid peptide, with a predicted molecular mass of 40 kDa. According to our in-silico analysis, AbTTRAP resembled the typical TTRAP domain architecture, including a 5'-tyrosyl DNA phosphodiesterase domain. Moreover, phylogenetic analysis revealed its common ancestral invertebrate origin, where AbTTRAP was clustered with molluskan counterparts. Quantitative real time PCR showed universally distributed expression of AbTTRAP in selected tissues of abalone, from which more prominent expression was detected in hemocytes. Upon stimulation with two pathogen-derived mitogens, lipopolysaccharide (LPS) and polyinosinic:polycytidylic acid (poly I:C), transcript levels of AbTTRAP in hemocytes and gill tissues were differentially modulated with time. In addition, the recombinant protein of AbTTRAP exhibited prominent endonuclease activity against abalone genomic DNA, which was enhanced by the presence of Mg(2+) in the medium. Collectively, these results reinforce the existence of the TNF signaling cascade in mollusks like disk abalone, further implicating the putative regulatory behavior of TTRAP in invertebrate host pathology. Copyright © 2014 Elsevier Ltd. All rights reserved.
Lafond, Manuel; El-Mabrouk, Nadia
A variety of methods based on sequence similarity, reconciliation, synteny or functional characteristics, can be used to infer orthology and paralogy relations between genes of a given gene family G. But is a given set C of orthology/paralogy constraints possible, i.e., can they simultaneously co-exist in an evolutionary history for G? While previous studies have focused on full sets of constraints, here we consider the general case where C does not necessarily involve a constraint for each pair of genes. The problem is subdivided in two parts: (1) Is C satisfiable, i.e. can we find an event-labeled gene tree G inducing C? (2) Is there such a G which is consistent, i.e., such that all displayed triplet phylogenies are included in a species tree? Previous results on the Graph sandwich problem can be used to answer to (1), and we provide polynomial-time algorithms for satisfiability and consistency with a given species tree. We also describe a new polynomial-time algorithm for the case of consistency with an unknown species tree and full knowledge of pairwise orthology/paralogy relationships, as well as a branch-and-bound algorithm in the case when unknown relations are present. We show that our algorithms can be used in combination with ProteinOrtho, a sequence similarity-based orthology detection tool, to extract a set of robust orthology/paralogy relationships.
Cross-species prophylactic efficacy of Sm-p80-based vaccine and intracellular localization of Sm-p80/Sm-p80 ortholog proteins during development in Schistosoma mansoni, Schistosoma japonicum, and Schistosoma haematobium.
Molehin, Adebayo J; Sennoune, Souad R; Zhang, Weidong; Rojo, Juan U; Siddiqui, Arif J; Herrera, Karlie A; Johnson, Laura; Sudduth, Justin; May, Jordan; Siddiqui, Afzal A
Schistosomiasis remains a major global health problem. Despite large-scale schistosomiasis control efforts, clear limitations such as possible emergence of drug resistance and reinfection rates highlight the need for an effective schistosomiasis vaccine. Schistosoma mansoni large subunit of calpain (Sm-p80)-based vaccine formulations have shown remarkable efficacy in protecting against S. mansoni challenge infections in mice and baboons. In this study, we evaluated the cross-species protective efficacy of Sm-p80 vaccine against S. japonicum and S. haematobium challenge infections in rodent models. We also elucidated the expression of Sm-p80 and Sm-p80 ortholog proteins in different developmental stages of S. mansoni, S. haematobium, and S. japonicum. Immunization with Sm-p80 vaccine reduced worm burden by 46.75% against S. japonicum challenge infection in mice. DNA prime/protein boost (1 + 1 dose administered on a single day) resulted in 26.95% reduction in worm burden in S. haematobium-hamster infection/challenge model. A balanced Th1 (IFN-γ, TNF-α, IL-2, and IL-12) and Th2 (IL-4, IgG1) type of responses were observed following vaccination in both S. japonicum and S. haematobium challenge trials and these are associated with the prophylactic efficacy of Sm-p80 vaccine. Immunohistochemistry demonstrated that Sm-p80/Sm-p80 ortholog proteins are expressed in different life cycle stages of the three major human species of schistosomes studied. The data presented in this study reinforce the potential of Sm-p80-based vaccine for both hepatic/intestinal and urogenital schistosomiasis occurring in different geographical areas of the world. Differential expression of Sm-p80/Sm-p80 protein orthologs in different life cycle makes this vaccine potentially useful in targeting different levels of infection, disease, and transmission.
George L Sutphin
Full Text Available The rapid advancement of technology in genomics and targeted genetic manipulation has made comparative biology an increasingly prominent strategy to model human disease processes. Predicting orthology relationships between species is a vital component of comparative biology. Dozens of strategies for predicting orthologs have been developed using combinations of gene and protein sequence, phylogenetic history, and functional interaction with progressively increasing accuracy. A relatively new class of orthology prediction strategies combines aspects of multiple methods into meta-tools, resulting in improved prediction performance. Here we present WORMHOLE, a novel ortholog prediction meta-tool that applies machine learning to integrate 17 distinct ortholog prediction algorithms to identify novel least diverged orthologs (LDOs between 6 eukaryotic species-humans, mice, zebrafish, fruit flies, nematodes, and budding yeast. Machine learning allows WORMHOLE to intelligently incorporate predictions from a wide-spectrum of strategies in order to form aggregate predictions of LDOs with high confidence. In this study we demonstrate the performance of WORMHOLE across each combination of query and target species. We show that WORMHOLE is particularly adept at improving LDO prediction performance between distantly related species, expanding the pool of LDOs while maintaining low evolutionary distance and a high level of functional relatedness between genes in LDO pairs. We present extensive validation, including cross-validated prediction of PANTHER LDOs and evaluation of evolutionary divergence and functional similarity, and discuss future applications of machine learning in ortholog prediction. A WORMHOLE web tool has been developed and is available at http://wormhole.jax.org/.
Sutphin, George L.; Mahoney, J. Matthew; Sheppard, Keith; Walton, David O.; Korstanje, Ron
The rapid advancement of technology in genomics and targeted genetic manipulation has made comparative biology an increasingly prominent strategy to model human disease processes. Predicting orthology relationships between species is a vital component of comparative biology. Dozens of strategies for predicting orthologs have been developed using combinations of gene and protein sequence, phylogenetic history, and functional interaction with progressively increasing accuracy. A relatively new class of orthology prediction strategies combines aspects of multiple methods into meta-tools, resulting in improved prediction performance. Here we present WORMHOLE, a novel ortholog prediction meta-tool that applies machine learning to integrate 17 distinct ortholog prediction algorithms to identify novel least diverged orthologs (LDOs) between 6 eukaryotic species—humans, mice, zebrafish, fruit flies, nematodes, and budding yeast. Machine learning allows WORMHOLE to intelligently incorporate predictions from a wide-spectrum of strategies in order to form aggregate predictions of LDOs with high confidence. In this study we demonstrate the performance of WORMHOLE across each combination of query and target species. We show that WORMHOLE is particularly adept at improving LDO prediction performance between distantly related species, expanding the pool of LDOs while maintaining low evolutionary distance and a high level of functional relatedness between genes in LDO pairs. We present extensive validation, including cross-validated prediction of PANTHER LDOs and evaluation of evolutionary divergence and functional similarity, and discuss future applications of machine learning in ortholog prediction. A WORMHOLE web tool has been developed and is available at http://wormhole.jax.org/. PMID:27812085
Full Text Available 0”. This cluster ID is uniquely-assigned by the PGDBj Ortholog Database. Cluster size Number of proteins aff...r About This Database Database Description Download License Update History of This Database Site Policy | Contact Us Cluster (Viridiplantae) - PGDBj - Ortholog DB | LSDB Archive ... ...List Contact us PGDBj - Ortholog DB Cluster (Viridiplantae) Data detail Data name Cluster (Viridiplantae) DO...switchLanguage; BLAST Search Image Search Home About Archive Update History Data
Full Text Available 3090”. This cluster ID is uniquely-assigned by the PGDBj Ortholog Database. Cluster size Number of proteins ...ster About This Database Database Description Download License Update History of This Database Site Policy | Contact Us Cluster (Cyanobacteria) - PGDBj - Ortholog DB | LSDB Archive ... ...List Contact us PGDBj - Ortholog DB Cluster (Cyanobacteria) Data detail Data name Cluster (Cyanobacteria) DO...switchLanguage; BLAST Search Image Search Home About Archive Update History Data
Full Text Available Abstract Background The concept of orthology is key to decoding evolutionary relationships among genes across different species using comparative genomics. QuartetS is a recently reported algorithm for large-scale orthology detection. Based on the well-established evolutionary principle that gene duplication events discriminate paralogous from orthologous genes, QuartetS has been shown to improve orthology detection accuracy while maintaining computational efficiency. Description QuartetS-DB is a new orthology database constructed using the QuartetS algorithm. The database provides orthology predictions among 1621 complete genomes (1365 bacterial, 92 archaeal, and 164 eukaryotic, covering more than seven million proteins and four million pairwise orthologs. It is a major source of orthologous groups, containing more than 300,000 groups of orthologous proteins and 236,000 corresponding gene trees. The database also provides over 500,000 groups of inparalogs. In addition to its size, a distinguishing feature of QuartetS-DB is the ability to allow users to select a cutoff value that modulates the balance between prediction accuracy and coverage of the retrieved pairwise orthologs. The database is accessible at https://applications.bioanalysis.org/quartetsdb. Conclusions QuartetS-DB is one of the largest orthology resources available to date. Because its orthology predictions are underpinned by evolutionary evidence obtained from sequenced genomes, we expect its accuracy to continue to increase in future releases as the genomes of additional species are sequenced.
Full Text Available Pseudomonas aeruginosa possesses a type III secretion system (T3SS to intoxicate host cells and evade innate immunity. This virulence-related machinery consists of a molecular syringe and needle assembled on the bacterial surface, which allows delivery of T3 effector proteins into infected cells. To accomplish a one-step effector translocation, a tip protein is required at the top end of the T3 needle structure. Strains lacking expression of the functional tip protein fail to intoxicate host cells.P. aeruginosa encodes a T3S that is highly homologous to the proteins encoded by Yersinia species. The needle tip proteins of Yersinia, LcrV, and P. aeruginosa, PcrV, share 37% identity and 65% similarity. Other known tip proteins are AcrV (Aeromonas, IpaD (Shigella, SipD (Salmonella, BipD (Burkholderia, EspA (EPEC, EHEC, Bsp22 (Bordetella, with additional proteins identified from various Gram negative species, such as Vibrio and Bordetella. The tip proteins can serve as a protective antigen or may be critical for sensing host cells and evading innate immune responses. Recognition of the host microenvironment transcriptionally activates synthesis of T3SS components. The machinery appears to be mechanically controlled by the assemblage of specific junctions within the apparatus. These junctions include the tip and base of the T3 apparatus, the needle proteins and components within the bacterial cytoplasm. The tip proteins likely have chaperone functions for translocon proteins, allowing the proper assembly of translocation channels in the host membrane and completing vectorial delivery of effector proteins into the host cytoplasm. Multifunctional features of the needle-tip proteins appear to be intricately controlled. In this review, we highlight the functional aspects and complex controls of T3 needle-tip proteins with particular emphasis on PcrV and LcrV.
Wolf Yuri I; Novichkov Pavel S; Sorokin Alexander V; Makarova Kira S; Koonin Eugene V
Abstract Background An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs). Rapid accumulation of genome sequences creates opportunities for refining COGs ...
Full Text Available The human ATRX gene encodes hATRX, a chromatin-remodeling protein harboring an helicase/ATPase and ADD domains. The ADD domain has two zinc fingers that bind to histone tails and mediate hATRX binding to chromatin. dAtrx, the putative ATRX homolog in Drosophila melanogaster, has a conserved helicase/ATPase domain but lacks the ADD domain. A bioinformatic search of the Drosophila genome using the human ADD sequence allowed us to identify the CG8290 annotated gene, which encodes three ADD harboring- isoforms generated by alternative splicing. This Drosophila ADD domain is highly similar in structure and in the amino acids which mediate the histone tail contacts to the ADD domain of hATRX as shown by 3D modeling. Very recently the CG8290 annotated gene has been named dadd1. We show through pull-down and CoIP assays that the products of the dadd1 gene interact physically with dAtrxL and HP1a and all of them mainly co-localize in the chromocenter, although euchromatic localization can also be observed through the chromosome arms. We confirm through ChIP analyses that these proteins are present in vivo in the same heterochromatic regions. The three isoforms are expressed throughout development. Flies carrying transheterozygous combinations of the dadd1 and atrx alleles are semi-viable and have different phenotypes including the appearance of melanotic masses. Interestingly, the dAdd1-b and c isoforms have extra domains, such as MADF, which suggest newly acquired functions of these proteins. These results strongly support that, in Drosophila, the atrx gene diverged and that the dadd1-encoded proteins participate with dAtrx in some cellular functions such as heterochromatin maintenance.
Zhang, Xueying; Wang, Liman; Xu, Xiaoyang; Cai, Caiping; Guo, Wangzhen
Mitogen-activated protein kinase (MAPK) cascades play a crucial role in plant growth and development as well as biotic and abiotic stress responses. Knowledge about the MAPK gene family in cotton is limited, and systematic investigation of MAPK family proteins has not been reported. By performing a bioinformatics homology search, we identified 28 putative MAPK genes in the Gossypium raimondii genome. These MAPK members were anchored onto 11 chromosomes in G. raimondii, with uneven distribution. Phylogenetic analysis showed that the MAPK candidates could be classified into the four known A, B, C and D groups, with more MAPKs containing the TEY phosphorylation site (18 members) than the TDY motif (10 members). Furthermore, 21 cDNA sequences of MAPKs with complete open reading frames (ORFs) were identified in G. hirsutum via PCR-based approaches, including 13 novel MAPKs and eight with homologs reported previously in tetraploid cotton. The expression patterns of 23 MAPK genes reveal their important roles in diverse functions in cotton, in both various developmental stages of vegetative and reproductive growth and in the stress response. Using a reverse genetics approach based on tobacco rattle virus-induced gene silencing (TRV-VIGS), we further verified that MPK9, MPK13 and MPK25 confer resistance to defoliating isolates of Verticillium dahliae in cotton. Silencing of MPK9, MPK13 and MPK25 can significantly enhance cotton susceptibility to this pathogen. This study presents a comprehensive identification of 28 mitogen-activated protein kinase genes in G. raimondii. Their phylogenetic relationships, transcript expression patterns and responses to various stressors were verified. This study provides the first systematic analysis of MAPKs in cotton, improving our understanding of defense responses in general and laying the foundation for future crop improvement using MAPKs.
Ruttink, Tom; Sterck, Lieven; Rohde, Antje
to outbreeding crop species hamper De Bruijn Graph-based de novo assembly algorithms, causing transcript fragmentation and the redundant assembly of allelic contigs. If multiple genotypes are sequenced to study genetic diversity, primary de novo assembly is best performed per genotype to limit the level......Despite current advances in next-generation sequencing data analysis procedures, de novo assembly of a reference sequence required for SNP discovery and expression analysis is still a major challenge in genetically uncharacterized, highly heterozygous species. High levels of polymorphism inherent...... of polymorphism and avoid transcript fragmentation. Here, we propose an Orthology Guided Assembly procedure that first uses sequence similarity (tBLASTn) to proteins of a model species to select allelic and fragmented contigs from all genotypes and then performs CAP3 clustering on a gene-by-gene basis. Thus, we...
Rasmussen, Jane Lind Nybo; Vesth, Tammi Camilla; Theobald, Sebastian
of genotype-to-phenotype. To achieve this, we have developed orthologous protein prediction software that utilizes genus-wide genetic diversity. The approach is optimized for large data sets, based on BLASTp considering protein identity and alignment coverage, and clustering using single linkage of bi......The Aspergillus genus contains leading industrial microorganisms, excelling in producing bioactive compounds and enzymes. Using synthetic biology and bioinformatics, we aim to re-engineer these organisms for applications within human health, pharmaceuticals, environmental engineering, and food......-directional hits. The result is orthologous protein families describing the genomic and functional features of individual species, clades and the core/pan genome of Aspergillus; and applicable to genotype-to-phenotype analyses in other microbial genera....
Full Text Available Abstract Background Mapping of orthologous genes among species serves an important role in functional genomics by allowing researchers to develop hypotheses about gene function in one species based on what is known about the functions of orthologs in other species. Several tools for predicting orthologous gene relationships are available. However, these tools can give different results and identification of predicted orthologs is not always straightforward. Results We report a simple but effective tool, the Drosophila RNAi Screening Center Integrative Ortholog Prediction Tool (DIOPT; http://www.flyrnai.org/diopt, for rapid identification of orthologs. DIOPT integrates existing approaches, facilitating rapid identification of orthologs among human, mouse, zebrafish, C. elegans, Drosophila, and S. cerevisiae. As compared to individual tools, DIOPT shows increased sensitivity with only a modest decrease in specificity. Moreover, the flexibility built into the DIOPT graphical user interface allows researchers with different goals to appropriately 'cast a wide net' or limit results to highest confidence predictions. DIOPT also displays protein and domain alignments, including percent amino acid identity, for predicted ortholog pairs. This helps users identify the most appropriate matches among multiple possible orthologs. To facilitate using model organisms for functional analysis of human disease-associated genes, we used DIOPT to predict high-confidence orthologs of disease genes in Online Mendelian Inheritance in Man (OMIM and genes in genome-wide association study (GWAS data sets. The results are accessible through the DIOPT diseases and traits query tool (DIOPT-DIST; http://www.flyrnai.org/diopt-dist. Conclusions DIOPT and DIOPT-DIST are useful resources for researchers working with model organisms, especially those who are interested in exploiting model organisms such as Drosophila to study the functions of human disease genes.
Hu, Yanhui; Flockhart, Ian; Vinayagam, Arunachalam; Bergwitz, Clemens; Berger, Bonnie; Perrimon, Norbert; Mohr, Stephanie E
Mapping of orthologous genes among species serves an important role in functional genomics by allowing researchers to develop hypotheses about gene function in one species based on what is known about the functions of orthologs in other species. Several tools for predicting orthologous gene relationships are available. However, these tools can give different results and identification of predicted orthologs is not always straightforward. We report a simple but effective tool, the Drosophila RNAi Screening Center Integrative Ortholog Prediction Tool (DIOPT; http://www.flyrnai.org/diopt), for rapid identification of orthologs. DIOPT integrates existing approaches, facilitating rapid identification of orthologs among human, mouse, zebrafish, C. elegans, Drosophila, and S. cerevisiae. As compared to individual tools, DIOPT shows increased sensitivity with only a modest decrease in specificity. Moreover, the flexibility built into the DIOPT graphical user interface allows researchers with different goals to appropriately 'cast a wide net' or limit results to highest confidence predictions. DIOPT also displays protein and domain alignments, including percent amino acid identity, for predicted ortholog pairs. This helps users identify the most appropriate matches among multiple possible orthologs. To facilitate using model organisms for functional analysis of human disease-associated genes, we used DIOPT to predict high-confidence orthologs of disease genes in Online Mendelian Inheritance in Man (OMIM) and genes in genome-wide association study (GWAS) data sets. The results are accessible through the DIOPT diseases and traits query tool (DIOPT-DIST; http://www.flyrnai.org/diopt-dist). DIOPT and DIOPT-DIST are useful resources for researchers working with model organisms, especially those who are interested in exploiting model organisms such as Drosophila to study the functions of human disease genes.
Altenhoff, Adrian M; Boeckmann, Brigitte; Capella-Gutierrez, Salvador
Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision-recall...
Full Text Available n all1672 Nostoc sp. PCC 7120 MFKILFDSDLILDAVMNRTELAEDVRTLLENLHPSIRLYLTDVGLQKVSTYTYCLKNSQIPEIIVDWLQEQIQICPIDQGLLQKARYSPLRDFESAVELACINHYQLNAIVTNKPEDFIVTAHPLCVWSFADLWLRVNLESQLQATIHS ...
Full Text Available ica MASRSITYPGLLLLVVALVLVPSSQAKRSPPRSAPTPAPRIAPSPAPRSAPTPAPRSAPTPAPRAPQALSPTPAPTPAPRTAPTPAPRSAPTPAPRA...PQAPSPTPAPTPAPTPAPRTAPTPAPRSAPTPAPRAPQAPSPTPAPMPAPRTAPTLAPRSAPTPAPRAPQAPSPTPAPRTAPISAPTPAPRSAPTPAPRA...PQAPSPTPAPTPAPRTAPTPAPRSAPTPAQRAPQAPSPTPAPTPPPRTAPTPAPAPAPKSAPTPAPMAPQAPSPTPAPTASPMTAPTPAPTSAPTPAPRAPQAPSPTPAPTPAPMDGSNTCS
Full Text Available ynechocystis sp. PCC 6803 MYFLLVTLVILVFPLLSIALEWTTSGNSQALVDVLARWFVFWGVGVRLFLAGVVQITKPSFTAEKILGVQSQDSLILVKELGIGNLAIASVALGSIFVNAWVLGAALAGGIFYLLAGINHILQPERNAKENYAMATDLFLGLLLGGILFFAWQP ...
Full Text Available ycine max MRKASSTMHCALLRFWVPLLLLASFSYAPSVSATTEISGNEVNTDGKVINEELAKPSLKGHDEEAKFKGFFPKPIPIVKPIPKPIPIIKPI...PIPVYKPIPKPIPIVKPIPKSIPVAKPIPNEEEKFKGFIPKPIPIIKPIPKIIPIVKPIPKIIPIVKPIPIPVYKPKPKLIPVVKPIPVIKPVPKIIPVVKPIPIIKPVPKPI...PIIKPIPKPFIVKKPIPTVESEEFLKPKPFFKKPIIPKLPLHPKFKKPLLPPLPIHKPIPTP
Full Text Available ycine max MRKPSSLQSALLRFGVPLLLVITFCYATSTAAATTQVSGNEELPKTNLNGHDEEAKFGFFHHKPIFKKHIPIPVYKPVPKPFPVYKPI...PKPVPVPVYKPIPKPVYKPIPKPVPVYKPIPKPVPVPVYKPIPKPVYKPIPKPVYKPIPKPVPVYKPIPKPVPVPVYKPIPKPVYKPIPKPVYKPIPKPVPVYKPIPKPVPVYKPI...PKPVYKPIPKPVPVYKPIPKPVPIYKPIPIFKPIPKPVPIYKPIPIFKPVPKPVPIYKPIPIFKPVPTPIPFFKPIPKPVPIVKPIPIPVFKPIPKPFPIVKPIP
Full Text Available SGAIIPALMHRIPSELRSSAAGGGEGSCALLAQDMSGAIIPALMHRIPSELRSSAAGGGEGSCALLAQDMSGSAAGGGEGSCALLAQGMSGAIIPALMHRIPSELRSSAAGGAGGVLCIVGAGHVGCDHTST...NAPDPIRTPQLGGGGRGGVLCIVGAGHVGCDHTSTNAPDPIRTPQLGGGGRGGVLCIVGAGHVGCDHTST...NAPDPIRTPQLGGGGRGGVLCIVGAGHVGCDHTSTNAPDLIRTPQLGGGGRGGVFCIVGAGHVGCDHTSTNAPDLIRTLQLGGGGRGGVFRIVGAGHVGCDHTST
Full Text Available KIYTIPDHVKKILRNLPARFRSKVTTIQEAKDLNNLLEINLLRSPDPLLWNLKDNMLKLFKMLNPKKRILMENQTVVQMRKRWCDHTITNAPDPIRTPQLIVLGCDHTSTNAPDPIRTPQLIVLGCDHTST...NAPDSIRTQQLTVLGCDHISTNAPDPIRTQQLSVLGCDHTSTNAPEPIRTQQLSVLGCDHTSTNAPDPIRTQQLSVLGCDHTSTN...APDSIRTQQLTVLGCDHISTNAPDPIRTQQLSVLGCDHTSTNAPDPIRTQQLSVLGCDHTSTNAPDPIRTQQLSVLGCDHISTNAPDPIRTPQLRVLGRE
Full Text Available MAFKKMSMIPCRLRNRWINCRKLLCRMNFVITHIYREDNRCADKLASKGLDIEGVTIWLNMPDFINNFVIHDRLGMCDHTSTNAPDPIRTPQLSVLGCDHTST...NAPDPIRTPQLSVLGCDHTSTNAPDPIKTPQLSVLGCDHTSTNAPDPIRTPQLSVLGSKSVADFNRTRAQKVPNRWPIFHDAKAQNCSKNRWCVHTST...DAPDPIRTPQLSVLGCVHTSTDAPDPIRTPQLRVLGCVHTSTDAPDPIRTLQLSVLGFVYTSTDAPDPIRTPQLSVLGFHQNSVVKRAWARVVLGWVTFWEVLVLCDHTST
Full Text Available PAVLGINFGKSKLTPLEQAPDDYAASLECLAPWADYAVINVSSPNTPGLRDLQDSTQLRRLVERLRRLPACPPLLVKIAPD...LEGVATELQRQDLRLEQVLFGCRFSNPVGLAAGFDKNGVAAGVWDCFGFGFAEVGTVTWHGQPGNPRPRLFRLAAERAALNRMGFNNNGAEAMQRTLERQALPAPGHR
Full Text Available INWSLSGPMLRASGVSWDLRKVDSYECYDDFDWQIASEKEGDCYARYRVRVEEMRQSLSIIRQACKMIPGGPTENLEAQRMATEDKKSEIFGIDYQYVAKKVAPTFKIPNGELYTRLESGKGEIGVFIQGNNEVTPWRFKIRAADLNNLQILPHILKGAKIADIMAILGSIDVIMGSVDR
Full Text Available otein SPLC1_S051210 Arthrospira platensis C1 MKVAINKLPLPTIIDDCTLSSFVPAIMAILELVSVAALGVLVGLFLFNINSKREEQRQLDHAFYQLIESQDGEISLIQLAALARVSADVAQEYLDRQAGVFAAIPEFDQDGNTFYRFPRLRLPKQFPERSPNQDW ...
Full Text Available rotein AmaxDRAFT_3596 Arthrospira maxima CS-328 MAINKLPLPTIIDDCTLSSFVPAIMAILELVSVAALGVLVGLFLFNINSKREEQRQLDHAFYQLIESQDGEISLIQLAALARVSADVAQEYLDRQAGVFAAIPEFDQDGNTFYRFPRLRLPKQFPERSPNQDW ...
Full Text Available 02412 Synechococcus sp. PCC 7502 MKLTPYLFLTITVTAIIGTSVWQSSAQMNKMMNHNMDEMSMELGAADANLDLRFIDAMIPHHQGAVQMAKEALKK...SKRPEIQKLATAIIKAQQEEIAQLQKWRKLWYPNMSSTPMAWHGEMGHMMTMSASQQKAMMMSMDLGAGDAKFDLRFIDAMIPHHEGALTMAQEALSKSKRPEIQKLAKAIITSQKAEIIEMQKWRKAWY ...
Full Text Available GGWIGITDKYWMTALVPDQSANSRMSFSDTPLRGQDVYQADYLRDPITVPANGTASITDRLFAGAKVVRIIDAYEGALGIDNFELAIDWGWFYFITKPLFLALLYIQGIVGNFGVAIIVLTILIKLAFFPLAN...TSYVAMSKMKKVQPEMMKLRDRYKDDKQRQQQELMELYRREKVNPLAGCLPILIQIPVFFALYKVLFVTIEMRHAPFFGWIED
Full Text Available VIIHAYSGNIEIESGATIGSGVLLVGKSKIGANVCIGSLATILEQNLESEKVVLPASIIGNSGRQFSDNSTISLPDQDSNQSYLFSNETQESSYSLNLANTASSTEETSTETEKANTQLPLANTS...LPAEETPTETEKANTQLPLANTSLPAEETPTETEKANTQLPLANTSLPVEETPTETEKANTQLPLANTSLPVEETPTETEKANTQLQEESPPNIDAQIYGKEYVNKIMQTLFPYKNSLSSHPDDED ...
Full Text Available se with PAS/PAC and GAF sensors Oscillatoria nigro-viridis PCC 7112 MIEESKSIKEKFGVLDSVPVGACLLQDDFVVLFWNTCLEE...YP_007117793.1 1117:4890 1150:2464 1158:318 482564:246 179408:246 diguanylate cycla
Full Text Available se with PAS/PAC and GAF sensors Arthrospira maxima CS-328 MMDKYLCPCCSEPLLIHIIAHKKIGFCMNCHQEMPLIEQSRQMATVTEPV...ZP_03273626.1 1117:4884 1150:2505 35823:234 129910:158 513049:158 diguanylate cycla
Full Text Available ZP_09781866.1 1117:4884 1150:2505 35823:234 376219:95 putative Diguanylate cyclase with PAS/PAC and GAF sens...ors Arthrospira sp. PCC 8005 MNQLMEDRSKILWIAGNVGNDNHSLPQSILQNNGYEVHLVIGLKPAYNAIQSWP
Full Text Available se with PAS/PAC and GAF sensors Oscillatoria nigro-viridis PCC 7112 MYLILPDLYANMTYQIDERLNTSPCGFLSFADDGTIVMVN...YP_007118829.1 1117:4890 1150:2464 1158:318 482564:246 179408:246 diguanylate cycla
Full Text Available ith PAS/PAC and Chase2 sensors Nostoc sp. PCC 7107 MSKQLGKSFVSSNLNLNLKQLLDRKYRQLVVAFSVAVCIILLRSVGMFQSLELAGLD...YP_007048593.1 1117:4890 1161:684 1162:948 1177:381 317936:58 diguanylate cyclase w
Full Text Available ZP_09782276.1 1117:4884 1150:2505 35823:234 376219:114 putative Diguanylate cyclase with PAS/PAC and GAF sen...sors Arthrospira sp. PCC 8005 MMDKYLCPCCSEPLLIHIIAHKKIGFCMNCHQEMPLIEQSRQMATVTEPVDVS
Full Text Available ZP_08491810.1 1117:4890 1150:2448 44471:122 119532:63 756067:63 diguanylate cyclase with PAS/PAC and GAF sen...sors Microcoleus vaginatus FGP-2 MANMTYQIDELLNTSPCGFLSFADDGTILMVNATLLQLLGYETDELRERK
Full Text Available se with PAS/PAC and GAF sensors Oscillatoria nigro-viridis PCC 7112 MLYNNEILPTLTVESSPRSMNILLYKLLSLRRIEYIAVDR...YP_007115817.1 1117:4890 1150:2464 1158:318 482564:246 179408:246 diguanylate cycla
Full Text Available YP_007068440.1 1117:4890 1161:684 1185:224 1186:169 99598:92 diguanylate cyclase with PAS/PAC and Chase2 sen...sors Calothrix sp. PCC 7507 MSKQLGKCLVKFIFGLKQSLGRGHRELITASSVVICILFLRSIGLLQFLELAALD
Full Text Available se with PAS/PAC and GAF sensors Crinalium epipsammum PCC 9333 MIEQDKTKDQLLAELATMRQLNNFLLFSGMGVQQHLEKLLIEEREF...YP_007141850.1 1117:4890 1150:2445 241421:53 241425:53 1173022:53 diguanylate cycla
Full Text Available YP_007168782.1 1117:4879 1118:3357 92682:39 76023:39 65093:39 diguanylate cyclase with PAS/PAC and GAF senso...rs Halothece sp. PCC 7418 MDKYLARRTQDLRQQAQARLEQRERETDLNEMTPAELAHELEIHQTELEIQYEELQR
Full Text Available ZP_08493855.1 1117:4890 1150:2448 44471:122 119532:63 756067:63 diguanylate cyclase with PAS/PAC and GAF sen...sors Microcoleus vaginatus FGP-2 MIEESKSIKEKFGVLDSVPVGACLLQDDFVVLFWNTCLEEWTKIPRSQIL
Full Text Available YP_003890085.1 NC_014501 1117:8352 ... 1118:3762 1301283:19569 ... 43988:641 497965:226 ... PAS/PAC and GAF sens...ors-containing diguanylate cyclase Cyanothece sp. PCC 7822 MWEFISNFLAPKSYIPHGHCYLWQ
Full Text Available WP_026797240.1 ... 1117:7814 ... 1150:52263 1301283:73459 ... 54304:2105 54307:820 ... rRN...LSQDTPRCEWVSPEVLEAMATTVHPDGVIALAPRIPSKPQTLEGLGIALETLQDPGNLGTIIRTAAATGVEGLWLSADSVDLDHPKVLRASAGAWFGLNKTVSPNLAR
Full Text Available agellar biosynthesis anti-sigma factor FlgM Planktothrix prolifica MDFNFFLQSFLNGLSIGSVYAIFALGYTLVFSILGVINFAH...AINFGTATKPIMIRSVQVIIFTVCMVIVALLTYLVNKTKIGKALQAVAEDEITASLLGINPEQFIILTFFVSGFLAGLAGT
Full Text Available WP_027248983.1 ... 1117:6845 ... 1150:53031 1301283:74313 ... 54304:678 1160:388 ... modification methylase Plankt...LSEKPDDRGYFPNLKTLRNYVEKTGESIDKFRFDVQVGERATERSKAKIYPEEYHLQELEDRLLYEESAFAADIRGKDRPVDRKLNGNGNKTNNSPQQIAEYIQPELWS
Full Text Available WP_027255124.1 ... 1117:6845 ... 1150:53031 1301283:74313 ... 54304:678 1160:388 ... modification methylase Plankt...LSGKPDDRGYFPNLKTLRNYVEKTGESIDKFRFDVQVGERATERSKAKIYPEEYHLQELEDRLLYEESAFAADIRGKDRPVDRKLNGNGNKTNNSPQQIAEYIQPELWS
Full Text Available WP_027249761.1 ... 1117:7024 ... 1150:51478 1301283:72586 ... 54304:14 1160:19 ... alpha/beta hydrolase Plankt...othrix agardhii MTVATNPNKTVISVNGVDHYCEWVTTANSTPGSKPVMVFIHGWGGSGRYWESTAQALSQEFDCLIYDLRGFG
Full Text Available gellar biosynthesis anti-sigma factor FlgM Planktothrix rubescens MDFNFFLQSFLNGLSIGSVYAIFALGYTLVFSILGVINFAHG...INFGTATKPIMIRSVQVIIFTVCMVIVALLTYLVNKTKIGKALQAVAEDEITASLLGINPEQFIILTFFVSGFLAGLAGTL
Full Text Available family transcriptional regulator Planktothrix agardhii MALYTTVSFKSELNDKGWRLTPQRETILQVFQNLPKGNHLSAEDLYTLLKSRG...EAISLSTIYRTLKLMARMGILRELELAEGHKHYEINQPYPHHHHHLVCVQCNKTLEFKNDSISKTSMKQAEKSGFHLLDCQLTIHTICHEALRMGWPSLISTNWSCSKVIADGLSEIDEIECQ
Full Text Available WP_027249156.1 ... 1117:7027 ... 1150:51287 1301283:72374 ... 54304:1227 1160:523 ... carbonate dehydratase Plankt...othrix agardhii MNKTQQNLTISRRNLLKFGAGVAGTAVLTVGLGTKVSLFKAQPAVAQNNITPEEALKQLLEGNQRFIE
Full Text Available phosphoribosyl)-5-amino-4-imidazole-carboxylate carboxylase Planktothrix agardhii MNPEALQQLLESVASGQITPTDALDK...IKYFDFEPVGDFARIDHHRKLRTGFPEVIWGLNKTPEQIIKIIEVMRQRNPVVMATRIEPHVYQQLQAQIPDLRYYEIAKICAIHPDEIPRSNSTGIITILTAGTADL
Full Text Available family transcriptional regulator Planktothrix rubescens MLTQDQPLTETVFLAKLNEIIESNHLLKHPFYQMWTEGKLTLTMLQEYAQE...YYLHVHNFPTYVSATHAACDDINIRKMLLENLIEEERGSAHHPELWLRFAEGLGVERSAVLDRQRLNKTQESVQILKKLSRSEEAEKGLAALYAYESQFPEVSTTKISGLEEFYGINEESALSFFKVHEKADEIHSQMTRKALLQLCQTTEQQRAALDSVQTAVDAFNLLLDGVYEEYCQN
Full Text Available WP_026796900.1 ... 1117:3511 ... 1150:51681 1301283:72812 ... 54304:1582 54307:631 ... exosortase Plankt...othrix prolifica MATSLKKLLIGTSVAVGMSAVGITPALAGSLTNATIGGTASTDYLIYGKEGNKTVVIPNSVANLQSVLDGNAVSPTG
Full Text Available family transcriptional regulator Planktothrix agardhii MLTQDQPLTETVFLAKLNEIIESNHLLKHPFYQMWTEGKLTLTMLQEYAQEY...YLHVHNFPTYVSATHAACDDINIRKMLLENLIEEERGSAHHPELWLRFAEGLGVERSAVLDRQRLNKTQESVQILKKLSRSEEAEKGLAALYAYESQFPEVSTTKISGLEEFYGINEESALSFFKVHEKADEIHSQMTRKALLQLCQTTEQQQAALDSVQTAVDAFNLLLDGVYEEYCQN
Full Text Available family transcriptional regulator Planktothrix agardhii MALYTTVSFKSELNDKGWRLTPQRETILQVFQNLPKGNHLSAEDLYTLLKSRG...EVISLSTIYRTLKLMARMGILRELELAEGHKHYEINQPYPHHHHHLVCVQCNKTLEFKNDSISKTSMKQAEKSGFHLLDCQLTIHTICHEALRMGWPSLISTNWSCSKVIADGPSEIDEIECR
Full Text Available WP_026786388.1 ... 1117:7027 ... 1150:51287 1301283:72374 ... 54304:1227 59512:475 ... carbonate dehydratase Plankt...othrix rubescens MNKTQQNLTISRRNLLKFGAGVAGTAVLTVGLGTKVSLFKAQPAVAQNGITPDEALNQLLEGNKRF
Full Text Available WP_026796248.1 ... 1117:5667 ... 1150:53378 1301283:74697 ... 54304:990 54307:186 ... alcohol dehydrogenase Plankt...RLRNLRISLELMLTPMLQARIHDQLDQTKILQQCARLIDEGKLKILVNKTFPLASASEAHQLLEAGGMKGKLVLTIE
Full Text Available mine monophosphate kinase Planktothrix agardhii MLIKDIGEQGLLEIVKGFCPSEIVGDDAAILAVSGDESLVITTDMLVDEVHFSDRTTSPF...DVGWRGAAVNLSDLAAMGAFPIGITVALGITDNKTVSWVEQLYQGLTTCLNQYQTPIVGGDICRSAVTCISITAFGRVNPKLAIRRSVARPGDKIIVTGDHGDSRAGL
Full Text Available RLSQDTPRCEWVSPEVLEAMATTVHPDGVIALAPRIPSKPQTLEGLGIALETLQDPGNLGTIIRTAAATGVEGLWLSADSVDLDHPKVLRASAGAWFGLNKTVSPNLA... WP_026789106.1 ... 1117:7814 ... 1150:52263 1301283:73459 ... 54304:2105 59512:1069 ... rR...NA methyltransferase Planktothrix rubescens MLTSLQNPLVKQIRKLHQAKGRKEQQLFLLEGTHLVEAACEVGYPLTTVCYTSSWQGRHQPLLE
Full Text Available WP_026787843.1 ... 1117:7208 ... 1150:51556 1301283:72673 ... 54304:147 59512:85 ... histidine kinase Plankt...ALNLSSVVIILTANDTVIDCRNAFKFGAWDYISKNMRGNVFDAVHDSIEEAITYFNRWGNVHNEQWITENLESLEKDYWGKYIAVINKTVIETADKEDSLNALLEQRK
Full Text Available WP_026797957.1 ... 1117:53289 ... 1150:51871 1301283:73023 ... 54304:1753 54307:1193 ... h...QILKDASKVSGDQLKYGQQLVQDFGNNLQAIDYADISNKTNSFLANINIPVNQLVLDSISFAGEQVLGKNKQLRKDMKVFLQSTPETMCQSYLDKVQGGDSSSWTAIE
Full Text Available WP_026787847.1 ... 1117:3748 ... 1150:52103 1301283:73282 ... 54304:1962 59512:816 ... NUDIX hydrolase Plankt...othrix rubescens MNKTLILEDFKVGVDNVIFSVDTEQNRLLVLLVKRKEEPFINTWSLPGTLVQKGESLENAAYRILAEKILVE
Full Text Available WP_027255308.1 ... 1117:42 ... 1150:53120 1301283:74412 ... 54304:758 1160:1574 ... molecu...GLLLQVLPKAATDEELITKLESRVASLSGFTPLLRANKTLPDILQELLGDIGLVILPESQLVRFDCSCSFERVLGALKMFGTEELQDMIEKDNGAEAKCEFCGEMYQANSDHLHQLIEDLRIKPEPEEVRKNSILF
Full Text Available WP_026796771.1 ... 1117:6176 ... 1150:52159 1301283:73343 ... 54304:2011 54307:550 ... cytochrome C6 Plankt...othrix prolifica MKKLLSVLILSFLLLTVLLPKSALAEGVLSGSTIFSNSCAACHINGNNVIVANKTLKKKALTKYLKGYEENPLAAIINQVTNGKNAMPNFKSRLTAREITTVAAYVAEQAEKAWSPLQ
Full Text Available ltransferase type 11 Planktothrix agardhii MAPVSYWDAQLYDSHHSFVSNLAVDLLELLDPRIGEHILDLGCGTGNLSYKITNTGAEVIGIDKASTMIKKANKT...YPGLNFLVIDGANLVWKEQFDAVFSNAVLHWIKQPEKVISGVCQALKPGGRFVAEFGGKGNIDTIITAIDQALDAAGYPKNKTLNPWYFPSISEYGML... WP_027255198.1 ... 1117:7185 ... 1150:53327 1301283:74641 ... 54304:944 1160:203 ... methy
Full Text Available 5-phosphoribosyl)-5-amino-4-imidazole-carboxylate carboxylase Planktothrix prolifica MNPEALQQLLESVASGQITPTDA...LDKIKYFDFEPVGDFARIDHHRKLRTGFPEVIWGLNKTPEQIIKIIEVMRQRNPVVMATRIEPHVYQQLQAQIPDLRYYEIAKICAIHPDEIPRSNSTGIITILTAGT
Full Text Available WP_026796473.1 ... 1117:3549 ... 1150:52595 1301283:73827 ... 54304:285 54307:343 ... haloacid dehalogenase Plankt...LRDVVQKFGERLGFSPTPTELESLANSIQDWQPFPDTIAALKALKQKYKLVIISNIDDNLFAQTNQHLQIEFDHIITAQQAQSYKPSAHNFQFALNKTGLSSDKLLHVAQSIFHDIATANSLGLTTVWVNRRQGQPGGGATKAAIAQPDLEVPDLKSLVDLIFEV
Full Text Available phosphoribosyl)-5-amino-4-imidazole-carboxylate carboxylase Planktothrix agardhii MNPEALQQLLESVASGQITPTDALDK...IKYFDFEPVGDFARIDHHRKLRTGFPEVIWGLNKTPEQIIKIIEVMRQRNPVVMATRIEPHVYQQLQAQIPDLRYYEIAKICAIHPDEIPRSNSTGIITILTAGTADL
Full Text Available WP_027254483.1 ... 1117:7024 ... 1150:51478 1301283:72586 ... 54304:14 1160:19 ... alpha/beta hydrolase Plankt...othrix agardhii MTVATNPNKTVISVNGVDHYCEWVTTANSTPGTKPVMVFIHGWGGSGRYWESTAQALSQEFDCLIYDLRGFG
Full Text Available SSAPNLEVIPSSLEEVSEIESALNIANMANLSNEELEELEQQEKLIRDKKGQISLARKEGIEIGREEGIGIGREEGIGIGREEGIGIGKEEGIGIGREEG...IGIGREEGIGIGKEEGIGIGREEGIGIGREEGIGIGREEGIGIGREEGMRILVKRQIRRKFGEVPPEVQTQIEQLSLEKLDILGDEIFDLAIIADLENWLANNG ...
Full Text Available SSAPNLEVIPSSLEEVSEIESALNIANMANLSNEELEELEQQEKLIRDKKGQISLARKEGIEIGREEGIGIGREEGIGIGREEGIGIGREEGIGIGREEG...IGIGREEGIGIGREEGIGIGREEGIGIGREEGIGIGREEGIGIGREEGMRILVKRQIRRKFGEVPPEVQTQIEQLSLEKLDILGDEIFDLAIIADLENWLANNG ...
Full Text Available oxin T-superfamily Coleofasciculus chthonoplastes MGGDNSKKPVLGLMNKGGFCCSVIGEEGGFCCSVIGEEGGFCCSVIGEEGGFCCSVIGEEGGFCCSVIGEEGGFCCSVIGEEGGFCCSVIGEEGGFCCSVIGEEGGFCCSVIGEEGGFCCSVISAIADFSP
Full Text Available xylamine reductase Arthrospira platensis NIES-39 MFCEQCEQTASGQGCHQWGACGKGPDVNAVQDLL... YP_005071100.1 NC_016640 1117:23174 ... 1150:27534 1301283:45982 ... 35823:1467 118562:2804 696747:2804 ... hydro
Full Text Available YP_007084156.1 NC_019694 1117:23174 ... 1150:9233 1301283:90386 ... 1158:1065 118323:857 56110:857 ... hydroxylam...ine reductase Oscillatoria acuminata PCC 6304 MYCSQCEQTAGGEACYQWGACGKSPEVDALQDLLTHL
Full Text Available NRQRDPFDFDLILEDQAVELAREIANRCQAGFVVLDAERDMARIVFAEGTVDFARIEGESLEQDLHRRDFTINAIAHHLQSQTLIDPLQGKQDLEARVIKMVSPQNLRDDPLRLLRAYRQAGQLTFTIDESTR...ESIREIAPCLQAVAAERIQTELSYLLATSQGSFYLQQAWEDGVLSPWFPNLTQESV
Full Text Available PEDINPSRRKLAETVSKQIAISLANLKLRETLQNQSFRDVLTGLYNRRYLEASIVRELHRVSRSNSTLGVILFDIDHFKRFNDTWGHDAGDAVLKAVGNLLQESTR...ESDIACRYGGEEFLIIMPDASLEDTQRRAHQLQIEIKHLQVSTRSQQLESITVSMGVASFPEHGLNYELLLRTADEALYRAKQQGRDRIVCAV ...
Full Text Available ASASYICHPKAPKRIAKVLPHVKLIALLRDPVDRAYSHYHHTKRIGRENLSFEEAIAQEETRVRQIESGGRRPGNNQPQAYNYTYLSSGLYAEQLQNWLEQFSKQQLLVLNSEDFFRNPPSSFKQVINFLKLPSWSLKNYRKHNFNQYPEPLKESTRESLTEYFRPHNHKLFELLGTDFGWSH
Full Text Available HNHEVLDAYEPNDLCAVLERHSGFESTRESWDTDRGIDISFMEAQRSPNVESGDQIEKKLPSNSDKTAIKKNGIIGRKVPEVRSKIVQGYLGKSKTEMTSKGKKSSLA...DWFVINHSGKPESQDAANCMLSLEGDSSNAKPSRKDVLVDDSFMVHARSTADDPYDSQWKTDIRTAADLTLSSQPENGTAD
Full Text Available in Cal7507_4468 Calothrix sp. PCC 7507 MYSQQLRTAIYGFFKRSHHLQLNCIDLQLNCIDLQLSCIDLQLSCIDLQLSCIDLQLNCIDLQLSCIDLQLSCIDLQLSCIDLQLSCI...DLQLSCIDLQLSCIDLQLSCIDLQLSCIDLQLSCIDLQSLNYPFLHFTDSFFVVMGTLKI ...
Full Text Available VPEQWWEKQNRCYLLWQFDKPPDVVIEIVSNREGDELEGKLSRYEQMRVSYYAVFDPNHELGPEELRIFELRGRHYAEMTETWLEQVGLGLTVWEGVFEGKQARWLRWCDAQGQLLLTGDERAELERERAEQERQRAEQERQRAEQERQRAEQERQRAERLAARLRALGEDPDLES ...
Full Text Available ANVGIYTTPSRPPIVSDVFLSLDVTVPEQWWEKQNRCYLLWQFDKPPDVVIEIVSNREGDELGGKFSRYEQMRVSYYVVFDPNHELGSEELRLFELRGRHYAEMTETWLEQVELGLRLWDGVFEGKQARWLRWCDAKGQLLLTGDERAELERQRAEQERQRAERLAARLRALGEDPDLER
Full Text Available AQEILNIYASDGEHIFPNFSLSKEIYKRLTLILNAFLLGKLIEVDSRTRFLASQPIFDGTELEKSTRKVFGEETFGVFKKKNKRVIVIAYDCWNSIPVIFDSEDPIYHNLKIVDILMASSAYPGGFPSRDISE...PSFLEQWIKQSDSRCSHPPNNLLPVVDGGLAANNPALIALSEYLKQDHQKSSVILA
Full Text Available n of unknown function DUF1830 Calothrix sp. PCC 6303 MAQILDSLPPEQSGKILCCYVNATSKIQVARITNIPNWYFERVVFPGQRLAFEAPRAAHLEIHTGMMASAILSDNIPCDRLVISEPSDPEPETNSTTEDENTCKNTIAPSIDNSTGDTPKNFKIAGLV ...
Full Text Available rotein Sta7437_0097 Stanieria cyanosphaera PCC 7437 MNYQFLNIPNPQSKQEAVISEPSNINRNTGENLAVPPAALIVIPGGLLVLAAILGFYRKINLTKIKDRSLFETDEQTTCKNCRFFSHNPYLKCALHPSRVSTTESIECADYWSNKSDRFQQKSK ...
Full Text Available tein Nos7524_5075 Nostoc sp. PCC 7524 MSLSYDISSILNLLRSLPSTELRTVKEEIDSILKERGTTIRIPDPFKIVPAQVVLKDSNLEESTSEVKLEEEYQQINEDISEPSGVLNLSSIKDATDNKAEKKEAIQEIPRPLGIWKGKVEISEDFYETTNDILSEFGIEE ...
Full Text Available YQGLHLGEVQLKGENIRINIGQVLRGKPLQLLETIRVSGAVAISNQNLQASISSALLGEGFKGLLETLLEHQGISEPSQLLDNYDIDWLGAYLDNNYFILQGKLTHDGVTQPLKIKAKLTLIPPQTLHLSEVEINGIPDLNNHNIKDFSVDLGSDVSINSFELDTDKLICEGELLIRP ...
Full Text Available TTKKKQSIFPAGKAAAILMEIDRRRRQLSLIPIDERFTSQRNKLPRILTEKHSHEPIHPDASSSIQTPLHPPRRLFIHPDASSSIQTPRHPFRRLVIHPDASS...SIQTPRHPFRRLVIHSDASSSIQTPRHPFRRLVIHSDASSSIQTPRHPFRRLVIHSDASSSIQTPRHPSLNSSAWMLFRRTVRIVISSNGCVSSLV
Full Text Available 745:703 ... 171637:1864 ... 721813:1696 ... 3749:1696 ... 3750:1696 ... PREDICTED: cytochrome P450 87A3-like Malus domest...ica MWSLVGLSFLVSLVVIFITPWIXKWRYPKCNGALPPGSMGLPFIGETLSLIIPSYSHDLLPFIKKRVRRYGPIFRTS
Full Text Available ATKDDNIKSALAELFPRSSSANLHHLKPLYVTAHIEGYPVSKVFVDYGATVNIMPMNIMKALRRSNDELIPSGIIMSSFIGDKSQTKGVLPLTVNIAGRTHMTAFFVVDSKTEYNALLGRDWIXQTSCIPSSLYQVLVFWDGFNDEGRPTRISVQKAIEVGAETVHQDSARLGLANFLPEADV
Full Text Available 745:703 ... 171637:1864 ... 721813:1696 ... 3749:1696 ... 3750:1696 ... PREDICTED: cytochrome P450 87A3-like Malus domest...ica MWSLVGLSFLVSLVVIFITPWIXKWRYPKCNGALPPGSMGLPFIGETLSLIIPSYSHDLLPFIKKRVRRYGPIFRTS
Full Text Available hypodium distachyon MMLSMPTDCWPDPVDCYEHLFCHMMQIYSLKLAYTSADADTSGNPILLYGFMAVRDCLNPR...RNYVFRRTRDDPFVVVQDSNGSSFIRMSGPKRGIEMQSLVLVEFDMRIKIGENEEDDLQLIDGAIYFDSLVLPPDMIINRRIVGDCGAVDMSLAFLHCAAEATIQVGI
Full Text Available rotein MicvaDRAFT_0707 Microcoleus vaginatus FGP-2 MIDFNTVTEFSHTYCIAICAFLVPANLLTTLVTVILTALNRPRIQIWASVVVASLWATAMIFHVFCWFAIGVVMPPTYILLVMGITCLTINVWAIAHPASMMQLIRVAVSVVRGSLQRKKDLVILERRM ...
Full Text Available achyantha MEQEGTGKEENDGSRWTEKLSPMEQEGTGKEENDGSRWTEKLSPMEQEGTGKEENDGSRWTEKLSPMEQEGTGKEENDGSRWTEKLSPME...QEGTGKEENDGSRWTEKLSPMEQEGTGKEENDGSRWTEKLSPMEQEGTGKEENDGSRWTEKLSPMEQEGTGKEENDGSRWTEKLSPMEQEGTGKEENDGSRWTEKLSPME...QEGTGKEENDGSRWTEKLSPMEQEEPLLEGLLQNGRLKFGQLMEQTISKVPEALSDAGGFSEIPCIMEDASNANDSPHSSVTGAKTV
Full Text Available IEEVGERYFDELLSRSLFQRPRLDRPSFTMHDLINDLAMFVSRKFCFRLDQKNSHEVPERVRHLSYMSEEFDISSKFEPLKGVKCLRIFFPVSLAPFGSEYPGYVSNK...HPDSVEIGKQIARKCNGLPLAAKTLGGLLSCNLDYKEWNHILNSNLWDLHANSVLPSLRLSYHYLPTYLKRCFAYCSIFPKDYEFEKENVILLWMAEGLIPHAKNEKA
Full Text Available LVAAAEEELSPDAMEILCQRFPLNSRCQGANAATPSSDETTEESTPAEDSISPENSIEETTPGAPLPDSSPEGITPAPEEPLPGTTPEGLTPLPGSVPGDEVTPSEDDAPTNITPMPGTTPEGEVTPSEDDA...PTNITPMPGTTPEGEVTPSEDDAPTNITPMPGTTPEGEVTPSEDDAPTNITPMPGTTPEGEVTPSEDDA...PTNITPMPGTTPEGEVTPSEDDAPTNITPMPGTTPEGEATPSEDDAPTNITPMPGTTPEGEATPSEDDAPTNITPMPGTTPSEDDAPTNITPNEDSVTPEGEATPSEEDDAPTNITPTPDGVTPDSDSPSNMDNPGGTSLPDADIPANTEEQTNPSEGGAKILAPQ
Full Text Available lus vulgaris MKVMKKKKSDAESQHDDYNEFKSDEDKDDTQYKKEKKERRKKKMSEEGKSKEYNEFESNEGEDDAQSK...KRKKKKLSTESKSKEYKVLERNEGEGDVEGKKGKLKKLSEEGKSKKYNEFESNEGEDDALGKKRKKKNLSTESKSKEFKVLERNEGEGEGEGDVEGKKRKLEKLSEEGKSKECNEFENNDGEDDA...QGRKRKKKKFKESKSEEYNVFERNEGEGDVKGKKTKKKKLSEEGKSKEYNEFENNEVEDDAQGKKKKLSKESTSEEYNVFERNGGEDYIEG...KKRKKKKPSEEGKSMEYNEFQNNEGEDDAKGKKTKKKKPGEEGKSKEYNEFEKNEVEDDAQGKKKKLSKESTSEKYNVFER
Full Text Available vulgaris MKVMKKKKSDAESQHDDYNEFKSDEDKDDTQYKKEKKERRKKKMSEEGKSKEYNEFESNEGEDDAQSKKRKK...KKLSTESKSKEYKVLERNEGEGDVEGKKGKLKKLSEEGKSKKYNEFESNEGEDDALGKKRKKKNLSTESKSKEFKVLERNEGEGEGEGDVEGKKRKLEKLSEEGKSKECNEFENNDGEDDA...QGRKRKKKKFKESKSEEYNVFERNEGEGDVKGKKTKKKKLSEEGKSKEYNEFENNEVEDDAQGKKKKLSKESTSEEYNVFERNGGEDYIEGKKRKKKKPSEEGKSMEYNEFQNNEGEDDA...KGKKTKKKKPGEEGKSKEYNEFEKNEVEDDAQGKKKKLSKESTSEKYNVFERNEGE
Full Text Available KEEKKEKRKHKRHRHHKSKSKRRHTTENSDSDESDDKDEGRKRVHSAVEHKREVKRSRQVKKDSREDSSDTDDNEPRKRRQERPEDDAPRRRRQDTPEDDAPRRRRQDTPEDDAPRRRRQDTPEDDA...LRRRQGTPEDDAPRKRRQDTPEDDAPRRRQQDRPEDDAPRRRQQNTPEDGEPRRREQEMALD
Full Text Available LNSRCQGANAATPSSDETTEESTPAEDSISPENSIEETTPGAPLPDSSPEGITPAPEEPLPGTTPEGLTPLPGSVPGDEVTPSEDDAPTNITPMPGTTPEGEVTPSEDDAPTNITPMPGTTPEGEVTPSEDDA...PTNITPMPGTTPEGEVTPSEDDAPTNITPMPGTTPEGEVTPSEDDAPTNITPMPGTTPEGEVTPSEDDA...PTNITPMPGTTPEGEATPSEDDAPTNITPMPGTTPEGEATPSEDDAPTNITPMPGTTPSEDDAPTNITPNEDSVTPEGEATPSEEDDAPTNITPTPDGVTPDSDSPSNMDNPGGTSLPDADIPANTEEQTNPSEGGAKILAPQ
Full Text Available VREENSQLQSQLSEVQTLLGQAHHHNEQLQSQLDRTEQQKAELQSQLDRVSSALQNVELQLADRSQQLSDTQSQLSSVQGERDQLQSELETANQTVAQLQSQLGDTQSQLSSVQGERD...QLQSELETANQTVAQLESQLGDTQSQLSSVQGERDQLQSELETANQTVAQLESQLGDTQSQLSSVQGERDQ...LQSELETANQTVAQLESQLGDTQSQLSSVQGERDQLQSELETANQTVAQLESQLGDTQSQLSSVQGERDQLQSELETANQTVAQLESQLGDTQSQLSSVQGERDQLQS...ELETANQTVAQLESQLGDTQSQLSSVQGERDQLQSELETANQTVAQLESQLSSVQGERDQLQSELETANQTVAQLESQLGDTQSQLSSVQGERDQLQSELETANQTVAQLESQLGDTQSQLSSVQGERD
Full Text Available GYMPSLLLDPNVKNPSQIQQLKDFGTGIHSQVSQEPSPYTEFYVQQRTSNPRKCKFMGCVKGARGASGLCISHGGGQRCQKPGCNKGAESKTTFCKTHGGGKRCEHLGCTKSAEGKTDFCISH...GGGRRCEFLEGCDKAARGRSGLCIKHGGGKRCNIEDCTRSAEGQAGLCISHGGGKRCQYFSGCEKGAQGSTNYCKAHGGGKRCIFSGCSKGAE
Full Text Available KRSGGYMPSLLLDPNVKNPSQIQQLKDFGTGIHSQVSQEPSPYTEFYVQQRTSNPRKCKFMGCVKGARGASGLCISHGGGQRCQKPGCNKGAESKTTFCKTHGGGKRCEHLGCTKSAEGKTDFCISH...GGGRRCEFLEGCDKAARGRSGLCIKHGGGKRCNIEDCTRSAEGQAGLCISHGGGKRCQYFSGCEKGAQGSTNYCKAHGGGKRCIFSGCS
Full Text Available KRSGGYMPSLLLDPTVRNPSQMQQLKDFGTGIHSQVSLEPSPYTALSVQQRTSNPRKCKFMGCLKGARGSSGLCISHGGGQRCQKPGCNKGAESRTTFCKTHGGGKRCEHLGCTKSAEGKTDFCISH...GGGRRCEFLEGCDKAARGKSGLCIKHGGGKRCNIENCTRSAEGQAGLCISHGGGKRCQFSSGCEKGAQGSTNYCKAHGGGKRCIFSGCS
Full Text Available ... 70447:1749 ... 70448:2455 ... TRAPP 20 K subunit (ISS) Ostreococcus tauri MSASAALTVVNANGRSVYERELGSSADSVDTDAH...VRELIGRAALDFADARSWESSATYLRLVDRFNDADAHGYRTSGGGRFVLTLRGRLRGNAGDETIRQFFTDAHEAYAIAKMNPMRDEDEDLGEAFDRAVRESFRRRLAPLFPFARTDE
Full Text Available 436017:4609 distinct from photosynthetic electron transfer catalyst, CYC6, partial Ostreococcus lucimarinus CCE9901 RDLERNGVATKEDISNLIERGKGKMPGYGESCAPKGACTFGARLDAEEIDALATYVLDRAAVDW ...
Full Text Available CELACDAGCDLGESPIWDAATSTLYFVDINSKRIHSYSPASGAHRTIQLEQPIGTVVPTSNPNILLAALEASCLPSLADCLDAWLARDIVEVDVAAGTTGRVLATTPE...EHGVDGMRFNDGKVSPQGTLLVGRMHCKWRDGQRGRLYRLDPGSSQLVEVLRPEEVHLPNGMAWDEAKGVVFYVDSGAETI
Full Text Available CEHGVKPRSRCKVCGACPHGKQRSRCKECGGSGICEHGRVRSLCKECGGSRICEHGRRRYECKACGGSQICEHGRERCRCKECGGGSICEHGRQRYRCKECGGSSICEHGRQRYRCKECG...GSQICKHGRERSKCKECGGSQICEHGRERCRCKECGGGSICEHGRQRSQCKECGGSAICEHGRHRSYCKECGGSAFCEHGRQRSQCKECG...GSQICEHGRIRSKCKECGGGSICEHGRMRSQCRECGGGSICEHGRRRSRCKECGGPRISTPP
Full Text Available GPCEHGVKPRSQCKVCSACPHGKRRRHCKECGGSQICEHGRVRSQCKECGGASICEHGRQRHRCKECGGAGICEHGRQRSVCKECGGSSICEHGRIRSTCKECGGSQICEHGRQRHRCKECG...GGGICEHGRQRSVCKECGGSQICEHGRVRSTCKECGGAGICEHGRQRHRCKECGGASICEHGRQRRYCKECGGSGICVHGRQRHSCKECG...GGGICEHDRQRHRCKECGGSQICEHGRVRSTCKECGGGSICEHGRRRSGCKECGGGGICEHGRQRSRCKECGGGSICEHGRRRCECKECGGSQICEHGRRRSQCKECGGASICEHGRHRHQCKECRAAKAKQSR
Full Text Available EEEEDASNKGTKRKRAPYTKGPCEHGVKYRSKCKVCSACPHGRERRYCKDCGGSKICEHGRQRDYCKECGGGAICEHGRERHRCKECGGSGICEHGRRRSRCKECGGSGICEHGRVRSRCKECG...GGSICEHGRERSRCKECSGSGVCEHGRERSKCKECGGASICEHGRQRSHCKECGGGSTCEHGRERRYCKECGGSGICEHGRIRSQCKECG...GSGICEHGRRRSDCKECGGSQICEHGRIRSTCKECGGSQICEHGRQRSYCKECGGGSICEHGRRRSRCKECGGSQICEHGRERSKCKECGGASICEHGRQRSQCKECG...GSGVCEHGRQRTRCKECGGASICEHGRVRSQCKECGGGGICEHGRQRSKCKECRAAKAGTHS
Full Text Available KGPCEHGVKYRSQCKVCSACPHGKWRYTCKECDGASICEHGRMRSTCKECGGSQICEHGRIRSQCKECGGASICEHGRQRSSCKECGGASICEHGRERRRCKECGGSEICEHSRRRTECKECG...GSGICEHGRVRSQCRECGGSAICEHGRVRSRCKECGGSAICEHGRVRSRCKECGGGAICEHGRVRSRCKECGGGAICEHGRIRSECKECG...GGSICEHGRRRSRCKECGGSQICEHGRRRNQCKECGGSQICEHGRRRTRCKECGGSEICEHSRQRYQCKDCLS
Full Text Available GTKRKRAPYTKGPCEHGVKPRSQCKVCSACPHGKRRRYCKECGGSQICEHGRIRTLCKECGGSRICEHGRERRRCKECGGGSICEHGRQRSYCKECGGSGICEHGRQRHYCKECGGGSICEHGRRRSECKECG...GGSICEHSRVRYTCKECGGSQICAHGRQRSTCKECGGSQICEHGRIRSTCKECGGSQICEHDCIRSTCKECG...GGSICEHGRQRDYCKECGGSRICAHGRERRYCKECGGSGICEHGRQRKQCKECGGSAICEHGRQRHQCKECRGSSVPVGRWVL
Full Text Available KGPCEHGVKPRSRCKVCSACPHGKWRKQCKECGGASICEHGRIRSVCKECGGASICEHGRQRSQCKECGGSEICEHGRHRSKCKECGGSQICEHGRQRHRCKECGGSSICEHGRHRPQCKECG...GASICEHGRHRYSCKECGGASICEHGRHRSKCKECGGSQICEHGRQRSRCKECGGGSICEHGRERSLCKECGGSQICEHGRRRSRCKECG...GGSICEHGRIRSQCKECGGASICEHGRQRSQCKECGGSQICEHGRRRSQCKECGGGSICEHGRIRSQCKDCRCERL
Full Text Available NREKASPATAVKATFFLITGHLQPNSTLIESLIAQGHEIANHGTQDLRTSQLAADAFASQFREANDILSRLAGQELRWYRPGQAFYNQSMRSFLGTFPGYESRFALASMIPLDTGKATNHPQFTTWYISQFIFPGAILVLHGGSPERDENTALVLKNLLGKLHDQGYQVVTLSQLVDHKR
Full Text Available DTPSSNTLNENPSVNSLVETGQVVSDTERQTSQNNNDNNNVNSQDKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPET...KPPETKPPETKPPETKPPGHKPPETKPPETKPPETKPPGHRPPGHKPPETKPPGHKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETRPPETKPPDTKPPETKPPETRPPETKPPETKPPETKPPETKAPGN
Full Text Available HFEVLKWARANGALWDENTCSSAAWSGDLQILQWARANGCPWDGETCSEAAIHGHLELLQWARANGCPWDESTCSRAAEGEELEVLKWARANGCPWDTKTCAEAACGGQLEMLQWARANGAPWDEETCS...KAAEGEDLEVLKWARENGCPWDTKTCAEAATGGQLEMLQWARANGAPWDEETCSKAAEGNELEVLQWARANGCPWDKETCKKAEEGGHLEVLRWAGENGAP
Full Text Available transferase Halothece sp. PCC 7418 MKIVIARDFNDFARCIMIRTQVFVMEQGISAEIETDEWENHSTHYLAGDGEKALATARSRLINNQTAKIERVAVLKEARSQGVGTELMRYILQEIHSYSNIQTIKLGSQNSAIPFYEKLGFQVIGEEYLDAGIPHHLMMQRINT ...
Full Text Available CEAGDVQGAIEDCNQALRINPKLAEAYCNRSNARCESGDVEGAIEDCNQALRINPKLAEAYLNRGNARRESGDIKRAIEDYNQGLRINPNLAQAYRNRGFARCESGDF...KGAIEDFNQAIRINPNLAQAYQNRGFARCESGDFKGAIEDFNQALRINPNYAEAYYNRGLAHNYSGDRQAEIEDFNQALRINPNLAEAYLNRGVTRRESGDVKGAIEDYNQALHINPNLAEAYQNRGFARC
Full Text Available sp. CC9605 MIAPLPMPAEPLLEQYGQGARLCPCANDQITLVFSQEYPFDLVELEQLLEAVGWSRRPIRRVRKALSHSLLKVGLWRHDPRVPRLVGFARCTGDGVFEATVWDVAVHPLYQGNGLGKQLMAYILEALDQMGTERVSLFADPGVVSFYQGQGWDLEPQGHRCAFWYAN ...
Full Text Available DIISSFIDDSIAIIDSVISGDKGDDVIGLEAGLFNSSVIGGDGNDIISTENAFVTGGSIEGNNGEDSLSLGLLDSVAVNGNADNDVIDLRGNIESSSIRGGQGNDTIS...SFIDDSITITDSVIAGDKGDDVFGLEADLFNSSVIGGDGNDSISTVNAFLTGGSIEGNDGEDTLSLGLLDGVAVNGNADNDVI...DFRSDIESSSIRGGQGNDTISPAFFINITDSVIAGDKGDDAIILGELNIETPLYNSTFNNASSADVDTVIDGGAGNDVIAFDGTQVEDATILGGEGADTFQLWNGG
Full Text Available ursor Glycine max MKKASAMHCALLRFWVPLLLLASFSFAPSVLAKAEISGNEVNTHGKVVSEELTKSSLKEQNEEEKFKGFFPKPIPIIKPISKPI...PIIKPIPKSIPIVKPIPIPVYKPISKPIPIVKPIPKPFLIVKPIPNDEEKFKGVFPKPIPIVKPIPKPIPIVKPIPIPIYKPIPKSVPIVKPIPIIKPILKPNPIVKPI...PKLIPIVKPIPNPLRVKKSIPAFGSEEFLKPKPFFEKPIPKLPLDPKFKKPLLPPLPIHKPIPTP
Full Text Available e Solanum tuberosum MIDEVKEEWPETPSFLNPETPNSQNPETPTFSNPESPTFSKSETPTFSMPETPTFSKPETPSFSKPETPSFSKPET...PSSQKLEASTFSKTETPTFSKLETPSFSKLETPISPNPETPTFSKPKTPSFSKPEIPSFSKPKTPSFSKSETPTLSKPETPSSPKPETPNSPKIEAPSFSKPETPSFSKPET...PTFSNPETLSSPKSETLTFQKPEIPSSPKLETQSSSKPETPSFSKPETPTFSKSKTPSSSKPEMPSSPKPETPSFSKPEILTFSKPKTPSFSKPETPSFSKPETPSFSRPET...PSFSNPETPSSSKPEPETLSSPKPKTPSSAKLETPSFSKLETPSFSKPETPSSLKPETPSFSKPET...PSSPNPKTPSSPKSETPSFSKPKTPSFSKPETPSSSKLETPNFLKPETPSSLKLEAPPTFLKPETSSSTKPKTPSFSTPETPTFSKPETPTFSKSETPSFSKSETPSSFKPETPSFSKPETPSSPKFETPSSPKPETPSSPKT
Full Text Available NASKDTPSSNTLNENPSVNSLVETGQVVSDTERQTSQNNNDNNNVNSQDKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPET...KPPETKPPETKPPETKPPETKPPGHKPPETKPPETKPPETKPPGHRPPGHKPPETKPPGHKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETRPPETKPPDTKPPETKPPETRPPETKPPETKPPETKPPETKAPGN ...
Full Text Available 0:266 ... 424551:266 ... 424574:266 ... 4107:266 ... 4113:1088 ... PREDICTED: proteoglycan 4-like Solanum tuberosum MPTLSKLEIPNSPNPET...PGSPKSVTPSISKPKTPSFSKPETPSFSTPETPSFSRPETPSFSKPETPSSSKPEAPSSLTPETPSFSKPETLSFSKPET...PSSPKLEIRNSAKPETPSFSKPETPSFSKPKTPSSPKPETPSFSKPKTPSSPNLKTPTPSSPNSQTPSFSNSRKPEAPTFLKPETPSSPKPKTPSFSTPETPTFSKPET...PNFSKSETPSFSKPETPSSFKPETHSFLKSETPSSPKPETPSSPKFEPPSSPKPETPSSPKTENPSSPNTETPNFSKPETPSSPKPNTPSFPKLDTPSFSNPKTPSYETPSFPKFETTSSQKPETPNSPKFGTPSLPKSKIPSDPIFETISFSKPETSNSSKPKIPTTP
Full Text Available VVDNPSFLQLSNSSSSSSNASKDTPSSNTLNENPSVNSLVETGQVVSDTERQTSQNNNDNNNVNSQDKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPET...KPPETKPPETKPPETKPPETKPPETKPPETKPPETKPPGHKPPETKPPETKPPETKPPGHRPPGHKPPETKPPGHKPPETKPPETKPPETKPPETKPPETKPPET...KPPETKPPETKPPETRPPETKPPDTKPPETKPPETRPPETKPPETKPPETKPPETKAPGN
Full Text Available e Solanum tuberosum MAQHYKLSSILLLAFIYFIHDHMITTITARRILQTPSFSTPTTPSFSMPTTPSFSKSPGVSKPASPSFSNSPSLSKPET...PSFSKSETLSFSKPETPSFSTSETPSFSKPETPSFSKPETPSFSKPEIPSFSKPETPSSPRLETPIFIKPETPTFSKPETPTFSKPKTPSLLKPETPSSQKPETPTFSKPET...PIFSKSETHSFSKPETPTSPNPETPTFSKPETPSSPKPETPSFSKPETSSFSKPETPTFSKPETPSSPKSETPSFSKPETPTFSKPET...PSSPKSETPTFPKPKIPSSLKPETPSSPNLETPSFLKPETPIFSKPETPSFSKPEMPSSTKPETPIPQSPRPLLSQSLKSQILQTPRPQLETSSFSKPET...PSFSKPETPSSSKPEAPSSPTPEMQSFTKPETPSFSKPETPNSPKPETPSFPKPETSTFSKPQTSNSPKSETSSSPKPETSSFSKSETPSFSKPEMPSSPTPETPNFSKPET
Full Text Available -repair coupling factor Arthrospira platensis C1 MCDRKFWAAKERGFRAIALVSYSLFRNHIITSPSPSPSVGARFRAIALVSYSLFRNHIIT...SPSPSPSVGARFRAIALVSYSLFRNHIITSPSPSPSVGARFRAIALVSYSLFRNHIITSPSPSPFVGARFRAIALVSYSLFRNHIITSPSPSPFVGARFRAIALVSYSLFRNHIIT...SPSPSPFVGARFRAIALVSYSLFRNHIITSPSPSPFVGARFRAIALVSYSLFRNHIITSPSPSPFVGEGFRVRGNVLSDW ...
Full Text Available ibiotic biosynthesis monooxygenase Chamaesiphon minutus PCC 6605 MPYVLIIHEVADYDAWKQ... YP_007097338.1 NC_019698 1117:24722 ... 1118:14027 1301283:4479 ... 217161:1887 1173032:1887 1173020:1887 ... Ant
Full Text Available YP_007118146.1 NC_019730 1117:843 ... 1150:13380 1301283:30256 ... 1158:4799 482564:3448 179408:3448 ... Antibiot...ic biosynthesis monooxygenase Oscillatoria nigro-viridis PCC 7112 MILEAVVLNVKSGCEGD
Full Text Available NYNSMNASIEIKQQESCQTNINHESCMFSKCMGGMQRFAIPPLPSFEVEQLNVVQGSRHCLSPHFQNSLVTFISYQKEKES... ... 1003877:124 ... 3655:124 ... 3656:1142 ... PREDICTED: transcription factor bHLH143-like Cucumis melo MVGTDTWQLH
Full Text Available 29:4530 ... putative DNA alkylation repair enzyme Leptolyngbya sp. PCC 7375 MTAKDISKILRDLADPVIAEHSQRFFKTGKGEY... WP_006519474.1 NZ_JH993796 1117:18358 ... 1150:46430 1301283:66978 ... 47251:4271 1021
Full Text Available WP_023070590.1 NZ_AWNH01000001 1117:18358 ... 1150:46430 1301283:66978 ... 47251:4271 1385935:793 ... dna alkyla...tion repair enzyme Leptolyngbya sp. Heron Island J MTADQISKTLRDLADPAIAEHSQRFFKTGKGE
Full Text Available ongatus PCC 7942 MNLLENEHNQRSLEAYRSTTLAERLTELAQSECIETRAAVAYNPSTPIAILETLANDSSLEVLTSLAENPNTPSPILEQLASHPDPELRAALATNPQLSSHTLEQLAHDPILAIRIAVANHKNTPSLTLKRLSVDSSSQVRQAAFQKLKPRTSKGKKKKR ...
Full Text Available WP_016872372.1 ... 1117:3375 ... 1189:4225 ... 1123:456 1124:456 ... ferredoxin Chloroglo...eopsis fritschii MVYQITSECISCKLCQSVCPTGAIKMVDNRLWIDPDLCTNCVDSIYTVPQCKASCPTGNGCVKVTTDYWENWFNTYNRLVAKLNKKPDYWENWYKLYSQKFSEQLQKHQQQMA
Full Text Available VQGKKLEEATLSQARRQIGFCFQDANDQLFMPTILEDVTFGPRNYGVPPGVARDRALALLDEFGLVEFANRSAHELSGGQRRLAALAAILALEPAILILDEPTNGLDPSWRRHLAKVLSQLPIEVILIASHNLNWIGKTTQRALVLADGEIKIDRKTQDLLADKSTLEYYNLPPDW ...
Full Text Available VVEVQNLVYAYSQQEPVLKEISFTLKRGDRVALMGATGSGKSTLLETLIGLKQPRTGEIWINGIKVAPKTLPQIRQHIGFAFQDANDQLFMPTILEDVTFGPRNYGVP...PAVAIDRARQLLADFGLEAYAHRSAHELSGGQRRLAALAAILALDPAILILDEPTNGLDPAWRRHLAQVLLKLSVQVILIASHDLHWLGRVTQRALVLSGGQIQIDGDIQPLLQDGNTLDQLGLPIDW
Full Text Available family enzyme, subfamily IA,REG-2-like HAD hydrolase, subfamily IA Leptolyngbya sp. PCC 7375 MTIPKVIFLDAVGTLFGVKGSVGEVYQALAQQAGVQASAH...ELDKAFYRSFAVANAMAFPGVPDVEIPHQEYLWWLAIAKDTFQRAGVFQEFSDFEVFFEGLYQHFATAAPWMVYQDTVNSLKR
Full Text Available PQSGKIWINGVPILPNTLPQIRQQIGFSFQDPHDQLFMPTILEDITFGPLNYGISPIAAKEKAHQLLADFGLESYAHRSAHELSGGQRRLAALAAILALDPSILILDEPTNGLDPAWRRHLAQVLLKLPVKVMLIASHDLQWLGRLTQRALVLSNGKIQIDSDIQPLLQDGKTLDKLGLPVDW
Full Text Available FMFNMPTTSNIAEIPLEVRPLQPIPSASSSIQQVLQFLQPTQSTTSSVQPAMQSFQQVPSTTTSPLRQAMQPLQQVPSTTTSPVHQQAMQSLPLIPSASTSPHQQAMQSLPLIPSAS...INNQPYRELSQSELLRLTSVAVQRQFYQINMENAMMSWFSEYSIPQQSLSQLRELSQALDPNPPSSSSSSLRSLDSPDSSS
Full Text Available SAAGDAAGGGGGCTAAVAACSQQLTSSSRLCGPSSLGYDCMAAWRNVTVHWSVNTTSAPANPCTPATKTILGAEVVTANGSLHMAVESRAPGYVALGFAAVAGKMVPS...DIILGWVTNNDTAPSVGTFDASQSGRLSANNNTNNSWAYDRGMSYNAATGVTTLCFSRLLSDDRAKSVPDLRASSGYQCTVVQDKVILHWSVNLRTAPANPCTPATKT
Full Text Available ri f. nagariensis IVSAVDAAGLTPLHFAAAANSLPVVRLLLAAGAPYAAQATADNLEAFMPANAGWTCLHVAAMRGHYEVAAAVLRFHVSRRPSPSSSPSSPSSSPSSSSSPSPSPSSSPSS...SSSSSSSSPSSSSSSSSSPSSSSPSSSSPSSSSSSSSSSSSSSSSSSSSSPSSSSSSSSLVRFRAKRLSAATCLASQG
Full Text Available Prunus mume MARSSAVVLVIMAALLVASTRALSPASSPSKSHAASPSSTPPSAAPSPSSHPPKAASSASPSSSPTAASSASPSSSPKAASSASPSS...SPPSLSPVSTPPSSSPSSTPPTATTSPSSTTPTTSPSPSSTSPEAAPSSKPAANSPPSPPSSSPVASPEISPSSSGDAPAPAPSGAISNRLAVAGSLASGVFAGVLVM
Full Text Available SVAAARGLGDGEPLSRRKNAARRLRGGEPLGARKNPADPHTSPGTPNCSPAPPQSGGGGYIPPSPSSGVSPTTPGGGGGYY...PPSPSVGTSPTTPTTPGGGGGGYYNNPPSPDIGTSPTTPTTPGGGGGGYNAPPSPSSDTSPSTPGSGGGYGAPPSPSSDTSPSTPGSGGGYGAPPSPCSGTSPSTPGGGGGYGAPPSPSS...DTSPSTPGSGCGGGYGAPPSPSSGTSPSTPGSGCGGGYGAPPSPSSDTSPSTPGGGGGCNAPPAPSGDTTPSTPGGGGGGYGSPPSPSSDTSPTTPVGGGGYGAPPSPSS...DTSPTTPGGGGGYGAPPSPSSDSSPTTPGGGGGYYGPPSPSSDTSPTTPGGGGGYYGPPSPSSDTSPTTPSAPSGGYYGPPSPSS...DTSPTTPGGGGGGGHYGPPSPSSDTSPTTPSTPSGGYYGPPSPSSDTSPTTPGITPTPDVPLPPISTPPTPYSPLTPTPTTPTPYDPNTP
Full Text Available ITIITIITIITIITIITIFTIITIITIIIIITIITIIITIITIIITIIITPSSSPSLPSSPSSPSSPSSPSSPSSSSSSSPSSQSSSSSPSSPSSPSSSPSSPSSSPSS...SHHHHHHHHHTIIITIIITPSSSPSSSPSSPSSPSSSPSSSPSSSPSSSPSSSPSSSHHHHHHHHHTIIITIIITITVVAAEKASPNAGEALRAYL ...
Full Text Available WP_007310489.1 ... 1117:12011 ... 1118:3290 1301283:19045 ... 263510:984 263511:984 ... Na-Ca exchanger/integri...n-beta4, partial Crocosphaera watsonii MDNTQFRGAFSRETNTIYFSEKLVRDTVLQFETVEDINSDNQYIQAFV...AVWLEEIGHSVEAQLNEEEIAGDEGRILAAVVIGESYTRDQLKQWQQEEDQNEIIINNQTINAEFSVVEDDLYGLLTGPVARWG
Full Text Available ZP_11685390.1 1117:4412 1118:3009 263510:243 263511:243 423471:1998 Na-Ca exchanger/integri...n-beta4, partial Crocosphaera watsonii WH 0003 MDNTQFRGAFSRETNTIYFSEKLVRDTVLQFETVEDINSDNQYIQAFVAVWLEEIGHSVEAQLNEEEIAGDEGRI
Full Text Available H endonuclease Nostoc sp. PCC 7107 MSSLYINAELRRLVARRADYICEYCLVSESDRSSGCQVDHIISVKHGGATTADNLCYACIFCNLQKGTDLGSINWQTGELVRFFNPRRDFWGEHFRLGEGVIQPLTDIGEVTARIFDFNCDERVIERQALILSGQYPSKSALKRINK
Full Text Available n Prochlorococcus marinus str. MIT 9211 MPGIGPRTAQRLALHLLRQPEERIKAFANALLNARNQVGQCQQCFHLTEGNECEICLNQNRQRNLICV...VADSRDLLALERTREYKGLYHVLGGLISPMDGIGPELLNISPLVKRITSEETTEVILALTPSVEGDTTSLYLAKLLNAFVKVTRIAYGLPVGSELEYADEVTLARALEGRRTVE ...
Full Text Available 47368:3137 ... 147385:3137 ... 15367:3137 ... 15368:3137 ... PREDICTED: bifunctional monodehydroascorbate reductase and carbonic anhydrase nect...arin-3-like Brachypodium distachyon MATRVGNAVVFALLLCARFL
Full Text Available 9:7 3650:7 ... 1003877:7 ... 3655:7 ... 3656:1095 ... PREDICTED: pectinesterase inhibitor-like Cucumis melo MANNSCLV...IVSLIGVLLFTIILNVASSNYVISTICSKSSNPPFCSSVLKSSGTTYLKGLAVYTLNLAHTNASKSLTLARSLATTTTNPQ
Full Text Available NNDNKLEVDRISSFSWSGQRRMPSYLNKLVFPENFLTALRTIAMQEDEISKVSSLLEELVGSGGERQPSDAEVRAAVWETCGDSGALQLLVDLLQAKLTELEESSGMEDYDSELLLKSCITESQGQHASCENNSSEETNGWTQHKMSRKTWSSIVYRRGQKELALLFLKEAEHALQLALTEGN
Full Text Available l regulator Rivularia sp. PCC 7116 MNSLKNKPLDPVNHAGFLIWQVANNWEKQINNELKEFGLNQAEYFHLVSLFWLLENQEEVTQTEIARFADTIPMNTSKIMTKFEKKGLITRVAGSDSRSKSLCITESGEQIAIQATARLSRLSEQFFDKDDDNNFLNYLKYLKTK ...
Full Text Available KCKEIVEREGERIDEEEYVNDADPSILRFLLASREEVSSMQLRDDLLSMLVAGHETTGSVLTWTLYLLSKDSSTLLKAREE...VDRVLKGRPPAYEDIKDLKFLTRCITESLRLYPHPPVLIRRAQVDDILPGNYKVKAGQDIMISVYNIHHSSQVWERAEEFVPERFDLESPVPNETNTDYRFIPFSGGPRKCVGDQFALLEAIVALAIFLQRLNFELVPDQDISMTTGATIHTTNGMYMKLSERRSKFDISSPTSSK
Full Text Available AREEVDRVLKGRPPAYEDIKDLKFLTRCITESLRLYPHPPVLIRRAQVDDILPGNYKVKAGQDIMISVYNIHHSSQVWERAEEFVPERFDLESPVPNETNTDYRFIPFSGGPRKCVGDQFALLEAIVALAIFLQRLNFELVPDQDISMTTGATIHTTNGMYMKLSERRSKFDISSPTSSK
Full Text Available QSREDYEAMLQEKAEPVLQEWMQRCITESLLTPRAVYGYFPAARDGNTLRVFDADGTRELGFFELPRQRSGNRYCIADFFNDLDAEGRPKDVLPMQAVTMGQKASVVA...MDAKRSDNWTNNKGFLADAPQGVGLDEEGTTSENAEETSTSASDAPAADLPPVSSDRSDAVPAEAAPVPPFLGSAVITEVDIDITEVFHYLDRNALFAGQWMLRKTKE
Full Text Available 1:1983 ... 7-cyano-7-deazaguanine reductase Cyanobium sp. PCC 7001 MTASHPPASSQGAPAAAASQAEPDGATRTPLYGERAIAEASL... WP_006911498.1 NZ_DS990557 1117:4682 ... 1118:6723 1301283:22859 ... 167375:1134 18028
Full Text Available 27:244 ... 7-cyano-7-deazaguanine reductase filamentous cyanobacterium ESFC-1 MNSMSSETEVAQTPEVKYGERAIADCELITF... WP_018395974.1 NZ_KB904821 1117:4682 ... 1150:39727 1301283:59529 44887:1731 ... 11284
Full Text Available WP_015116097.1 ... 1117:4682 ... 1161:4218 ... 1162:3122 1177:1104 317936:3404 ... NADPH-d...ependent 7-cyano-7-deazaguanine reductase Nostoc sp. PCC 7107 MTQDTSEVKYGERNIAEGNLITFPNPRVGRRYDINITLPEFTCKCP
Full Text Available :535 ... 7-carboxy-7-deazaguanine synthase, partial Prochlorococcus sp. scB243_498G3 MTNFLPLVEQFHSLQGEGYHAGKSAFFVRLAGCKVGCSWCDTKNS ... WP_029986136.1 NZ_JFMQ01000246 1117:6374 ... 1212:1251 ... 1217:2126 1218:2126 1471492
Full Text Available :556 ... 7-carboxy-7-deazaguanine synthase, partial Prochlorococcus sp. scB241_527E14 RSGVNSISGSYDWITLSPKRHSPPKNYFLKNCNEMKIIINEIEDIEFAIQIKKETLKQYQLSKSEDGL ... WP_029953398.1 NZ_JFKU01000143 1117:6377 ... 1212:1253 ... 1217:2123 1218:2123 1471444
Full Text Available IPQSDIAPGAPASAIHGGSRGGRDRYRGDYGSRYDRPDRGYRSGYDRPPPRGYDRGYGGGYDRAPYPDRGYGDRGYDRGYAGEGGGYDRG...HPDPYAYPPKDPYARDAYGYEEAPQSYGYSDRGGYAAPYTRGGPITDRYNGGDRYSGSAGAAAPAERSPDRYARAPGGPDRDYGAARYGGGARPGPYERSGEAPGRAPPREAPRSR
Full Text Available ed factor 2N-like Cicer arietinum MDDGLSPDRGYGLSPDMGYGLSPDRGYGLSPDRGYGLSPDRGDGLSPNRGDVLSPDRGDGLSPDRG...YGLSPDRGYGISPDRGYGISPDRGYGLSPDRRYRLSHNRRYMLSHNRRYRLSPDRRYGLSLDRRYRLSPNRRQHLSGDRR
Full Text Available erase Synechococcus elongatus PCC 6301 MSDYSDRSDFSERQDRSSDDRPNFRRFDRDRPSEGRRLEGRSEGGYRGRDDRGGSGGYRQNRDDRGGFRGRDDRG...GYRGGDRDRPSEGRRFEGRSEGGYRGRDDRGGSGGYRQNRDDRGGFRGRDDRGGYRGGDRDRPSEGRRFEGRSEGGFRGRDDRGGSGGYRGRDDRGGFRGRDDRGGFRGRDDRGSSGSYRGRDDRG
Full Text Available DVLKPGSEMVAAGYCMYGSSCTLVFSTGSGVNGFTLDPSLGEFILTHPDIKIPKKGKIYSVNEGNAKNWDEPTRLYVENAKFPKDDSSPKSLRYIGSMVADVHRTLLYGGIFMYPADKKSPFGKLRVLYEVFP...MSYLVEQAGGQAFTGKERALDLVPKHLHERSPIFLGSYDDVEEIKKLYSATIA
Full Text Available PPLVSVHPFPGGLEIMCGYSFGKSVFPMSRVLDILLKEGINVVSTTSIRRDGRFIHTIRSEDPNHLNMTGADYSELQIKLTEAISLQETLPEHEN ... ...RQRRQEMGKLCATLRSLLPLEYIKGKRSTSDYVNEAMNYINHLQNKVKQLQAKRDELVKVSNLKSNICSENESSSSSTTHL
Full Text Available omitrella patens subsp. patens MALCFDNGRCPWLSCSLVEAGSAGEASSLNGISAVSAHGNSKSLQPDEDVTCLLQADALAGIKGATSAYPKSKRSNRSVARVPVSLNCKSPSCVLGTGTATPDRVFPMSDFARDILHGFGENDTPAVREFSARVCKLITTHLPASL ...
Full Text Available 24:2780 3398:2780 71240:414 91827:414 71275:1623 91835:562 72025:981 3803:981 3814:981 163735:1026 3846:1026 3847:1026 PREDICTED: pat...ellin-4-like Glycine max MNNTNECCCNDENGKIVVGVPLVFNFFDKNTNNIEKSLKVQLEKKNQLEDHDCDQEDD
Full Text Available 24:2780 3398:2780 71240:414 91827:414 71275:1623 91835:562 72025:981 3803:981 3814:981 163735:1026 3846:1026 3847:1026 PREDICTED: pat...ellin-3-like Glycine max MAQNDSNPTPPPEPHVAAEPITEDLVQDKEEEDDSSKIVIPVPESESLSLKEDSNRVS
Full Text Available 24:2780 3398:2780 71240:414 91827:414 71275:1623 91835:562 72025:981 3803:981 3814:981 163735:1026 3846:1026 3847:1026 PREDICTED: pat...ellin-6-like Glycine max MMDTTSSPLSLQTQKTTFQELPEASPKPYKKGIVATLMGGGLFKEDNYFVSLLRSSEK
Full Text Available HPRKEREDGPTKVFRVQDFEDNEDFVVTWNESTLEVLCACYLFEFNGFLCRHVMIVLQISAVHSIPPRYILKRWTKDAKSRQTAGDLSMSDAVVSDSRAKRYNNLCQQAFQLGDVGSLSQESYIAAINALEAALRKCKSLNDSIHSVKEPNLPCSGSQEGILISNSVGHSNKRDSTLGKRK
Full Text Available YP_005070449.1 NC_016640 1117:22154 ... 1150:29420 1301283:48078 ... 35823:3164 118562:2384 696747:2384 ... creat...ininase Arthrospira platensis NIES-39 MHNFIPPHRFFPYLTWTEIAEMPDRENTVIIQPIGAIEQHGPHLP
Full Text Available DALIKEKVAVGDVIYIEANSGAVKRVGRSDAFATEFDLEAEEYVPLPKGEVHKKKEIVQDVTLQDLDAANARPQGGQDILSLMGQMMKPRKTEITDKLRQEINKVVNR...YIDEGVAELVPGVLFIDEVHMLDMECFSYLNRALESSLSPIVIFATNRGVCNVRGTDMPSPHGVPIDLLDRLVIIRTQIYN
Full Text Available INKVVNGFIDKGTAELVPGVLFIDEVHMLDMECFSYLNRALESSLSPIVIFATNRGICNVRGTDMNSPHGIPVDLLDRLVIIRTENYGPAEVIQILALRAQVEELHLDEESLAYLGEIGQRSSLRHAVQLLSPASIVAKMNGREEIRKADLEEVCALYLDAKSSAKLLQDQQEKYIS ...
Full Text Available ALIKEKVAVGDVIYIEANSGAVKRVGRSDAFATEFDLEAEEYVPLPKGEVHKKKEIVQDVTLHDLDAANARPQGGQDILSLMGQMMKPRKTEITDKLRQEINKVVNRYIDEGVAELVPGVLFIDEVHMLDMEC...FSYLNRALESSLSPIVIFATNRGICNVRGTDMASPHGIPVDLLDRLVIIRTQTYDL
Full Text Available RYIDEGVAELVPGVLFIDEVHMLDMECFSYLNRALESSLSPIVIFATNRGVCNVRGTDMPSPHGVPIDLLDRLVIIRTQIY...YDALIKEKVAVGDVIYIEANSGAVKRVGRSDAFATEFDLEAEEYVPLPKGEVHKKKEIVQDVTLQDLDAANARPQGGQDILSLMGQMMKPRKTEITDKLRQEINKVVN
Full Text Available ALIKEKVAVGDVIYIEANSGAVKRVGRSDAFATEFDLEAEEYVPLPKGEVHKKKEIVQDVTLHDLDAANARPQGGQDILSLMGQMMKPRKTEITDKLRQEINKVVNRYIDEGVAELVPGVLFIDEVHMLDMEC...FSYLNRALESSLSPIVIFATNRGICNVRGTDMASPHGIPVDLLDRLVIIRTQTYDL
Full Text Available RFHPFFCDFAFTGFPKWLVCDSCVESFSLKQQEYSSDVKEGGVADKSGSYLSRECGCCIGGVDFVSLIFTLSAVPQKKMSSAIMECFSVLKPGGLLLFRDYGLYDMTMLRFEQEKRVGFREYMRSDGTRSYFFCLDTVRDLFVGVGFIELELEYCCVKSVNRRKGKSMQRVWVHGKFQKPA ...
Full Text Available Malus domestica MEDKLSIVAKSIAPSPIQELSHLARRCNAINLAEGFPDFPAPSHIKHAAISAIHSDFNQYRHVQGICDHLANIMKQMHGLDVNPLTDMA...ICCGQSEALAAAAFAIIDKGDEVILFDPCYETYEGCIKMAGGVSVYVPLDPPHWTLDPNRFINAFNERTKALVLNSPHNPTGK
Full Text Available KLCVTKGESPQQILQNPQPATMRYPLECSRYIWQVRASKALAGGSCLSKSSSCPSLMEPTLVKGLTGALGFSVPSSLLSESAQSGCMIVESYSPKSSFRAS ...SSAAIVYVGLVHVYMLQSPYEILGCIMPYFLTHSRGRFFSVGSKSDNNWIGSDAKKHKEKKRLPPSRIHAPVFVQPSSLTD
Full Text Available IGISIQSRGEREQQFDQQLVQSILDKFPQSSLMFFPHEPADVNYQTNILKEIEKTHNNLSKRIIIIPYDTLPNIHKALVKKCKVCIGTRFHFQLFALSTATPSISIFKRLKTLGIFRDYHIENLCFNPLECSEPVENTLDMLADVLQSCENYRDLLKNCIANDKLASHIVLQRAIERFHSLT
Full Text Available DWSCSGTTKAHESTVWDLTQEKVKEGGRFATVSDDGWVKIWKAGGAQRTEISGSSSSSGKKDDAAPLECSFRSGHDRPILCVDWLGDANRKNSESGNKKQPATLAAGGGDNSVRVYREKETGVWEEAAAVVNAHEDDVNDIAWFSMASEKEETEQSTNYFASASDDGTVKIWTFCSSNVDE
Full Text Available IAIVVVLLASVLSPLCAKNGLAVDVPLATEAPSASPQAPFYAPEEAPLPPAFQAPFAAPSPSQAPSQSPGLPDEPPSPSQAPSQSLGEAPS...QNPRPCSCYEPLSPLQAPSQSPGLSYEPLSPSQAPSQSPGLSYEPVSPSQAPSQSPGLSYGPPSPSQAPSQSPGLSYEPLSPSQAPSQSPGLSYEPPSPSQAPSQSPELSYEPSSPSQAPS...QSPGLSYEPPSASQAPSPSPGLSYESLSPSQAPSQNPGLPSKPPSPSLAPPPPKKKSPPKSIPPYMPWGPLPGHRIHPRLPPLREDIYQCWKTLCPIPYCVDRTYWSFYAGKIDVGSFCCKAFERTNDTCFRKMFLAFPNPRLKDALLTYCSKH
Full Text Available GNENPLATREFPHSGDVNPHGISCHISMETPLTCSSSSGRPSNSSEQLPCVTPLSPEDDIIHSLINDGDGYRHYSMDSIDGILLALERIQQGRELTHEVIFD ...SSYGLRNIRCNTISDSTQPGCSSSDSSLNRRKDTIKKRNCEGESSSTVRGKNITGSSLEGRNSGSRNGISISDSGRTRNTPSHRNTSMTPVRTQRSASGHARGRFYSQ
Full Text Available IHRSFDEPNWEPSFDHNDDTDSIWGFNPSTTKDFDSDKHRENDIFGSGNLGINPIRTESPHDDPFQRKSPFSFEDSVPSTPLSKFGNSPRYSEWAGEHHFDMSSRFDSFSMHDGGFSPPRETLTRFDSISSSRDFGHGQASSRGFDHGQTYSFDDSDPFGSTGPFKVSSDSQTPRKGSDNWGF
Full Text Available RTRCSEEVAAASAEVEVAKWGLATEGWAAEAKATVQVVEWRVAEGWAAEAEATVEVAELGVAEGSAAEAEAEASAEGWAAEAEATVQVEEWGVAEGSAAEAEATVQVAELGMAEGSAAEAEAEATV...QVAEWGVAEGSAAEAEATVEVAELGVARGGAEATVEVAELGVAEGSAAEAEATVQVAEWGVAEGSAAEAEATV...EVAELGVAEGWAAEAEATVQVEEWGVAEGSAAEAEATVQVAELGMAEGSAAEAEATVQVAEWGVAEGSAAEAEATVEVAEWGVARGGVASADSAGSVG...PVVNSVGAAVGVAVASDSAPRLPKTQSWTKVDAVASADSVDSVGPVVNSVGAAVGVAMASDSVGPVVDSVGAAVGVAMASDTVAEWGVAEGSAAEAEATVQVAEWGVAEGSATEAEATV...QVAEWGVAEGSAAEAEATVQVAEWGVAEGSAAEAEATVQVAELGVAEGSAAEAEATVQVAEWGVAEGSAAEAEATVEVAELGVAEGSAAEAEATV
Full Text Available TVNAISGYSGGGKKLIAQYEGFRAEAPAEESRSPYSPYGLGFRHKHVKEMQKYARLHHVPLFVPAVGDFAQGMVVQIPLPL...WSLEDAPTGKQLHEKLSSHYANEPFVPVAPLNDQGQLRDGSFFETKSANQTNEVQLFVFANDETKEALLIARLDNLGKGASGAAVQNLNIMLGLSEKSGLSS
Full Text Available YP_007169459.1 NC_019779 1117:22154 ... 1118:7682 1301283:23924 92682:2160 ... 76023:2160 65093:2160 ... creatini...nase Halothece sp. PCC 7418 MLLHLQTWQEVEQYLEQSRGMIIPIGSTEQHGPTGLIGTDAICAEVIAKGVGENT
Full Text Available NKSDKLTFKFEVTNTTAVFLALLWLPSVLKIFALTGGAVKTPAGEITGSSMMPMLQSLTGDTLGFLIEQTKLAEDVAPPQQQLEMRQMRHEWQKEYASRVPASEARQQ...MERLSQRYKELRNSMPAGAKRTFEMESIAGRMRSLASEVNFSDEEVNQLLKSNDQGKRLLGLSVVEWSADPNYFYTILNIINNSETAFEQTCALRGAGKMVPKLNEQQKKDLMSALVHQRNYNEAEKCWIRPNSNRWSLSDRILSALEA
Full Text Available 3_3832 Microcoleus sp. PCC 7113 MQFPRRYFVLLPLTAVLSLLMVSCSESKVSQCNKIIKVANKAADEAKAITNGGKESDPKAMLKAADALDKASQEMESIKVSDDKLRDYQGRFFVMYRDTSKSTRDFITAFEKKDRPAAEAAIVKLQKTTALETPLVQEINKYCQPEK ...
Full Text Available BC 3 transport family Coleofasciculus chthonoplastes MFYKELLAISFDETFATVRNVPVNGLYLMLMGIIALTVVMLMQVVGVILLIALLSMPAAIANLYVKDLKKMMLLASLLCMIFTTAGLWLSYVLNFTSGATIVLCAAIAYLASLGINFLRRYRQRQTEFKR
Full Text Available family Coleofasciculus chthonoplastes PCC 7420 MFYKELLAISFDETFATVRNVPVNGLYLMLMGIIALTVVMLMQVVGVILLIALLSMPAAIANLYVKDLKKMMLLASLLCMIFTTAGLWLSYVLNFTSGATIVLCAAIAYLASLGINFLRRYRQRQTEFKR ...
Full Text Available GLGRAAINPGIIFSDADNVIPDLVWASTERLAILLDEAGHLTAAPELIVEVLSPGADNHRRDRDLKLRLYSVRGVQEYWVVNWQLQQIEVYRREQASLRLIATLLSNDELTSPLLPGFSCSVAQIFA ... ...wn function DUF820 Microcoleus vaginatus FGP-2 MNQTISDKVRWTTADLELLPDNGDRYEIIDGELFMTRAPHWGHQEACGNIYLELKTWSRAS
Full Text Available FPDVNNGSRVMLTTRKIDVANHIEMPTYVHKLKLLDAEKSWELFSSKALPSYKRSLIHNIHEFEELGRKLARKCNGLPLALAVLGGYLSKNLTVGAWSDLLGGWAS...TENGQVMRDILARSYNDLPDSSIKSCFLYLAVFPEDFSIFASELIELWIAEGFIQRTSKHTEEEIARKYIYELSQRSLVQVVS
Full Text Available 3323771 isoform X2 Prunus mume MMWVRARSSAGVNIRKLSTAVQRRIEDEGDWSYASEWWGTESEGHTVLRS...TSDKGNGVVSVVAYPSSKPSELHWASTERWLQKRYEEIHGCHEQNDRFRVLGYQWRALRFNDDTRQSTVKVLAAYRQSEPGVFAVMQQPHCLAVPYLKSMISAGLAAI
Full Text Available NMNMVLRREHVYCGNKLYITKAKFGGKEREISIDCRIGEDPRLYFSVDNKRVLQIKHLRWKFRGNERIEVDGVGVAVSWDVYNWLFDDDEDGYALYMFKFEKSSLDEEQFNNSMWSQQSCGFGFETKMMKKGVLRSSSSSSSSLSSASSSCSSSVMEWASTEENEMKDPTGFSLLVYAWKT
Full Text Available SRWSIIASKLPGRTDNDVKNYWNTKLKKKAMGAVQPPRGAGAGARAFSAAPATPPDTVQSQCTSSAQAPALSPASSSVTSSSGDACFATTMYPQPTTSPQQYIRFDAP...AAASQTELPPVPPTATVTPDGCNWASTEAGAVSLDDVFLGELTAGEQLFPYADLFSSFTGLPESKANLELSACYFPNMAEMWAASDHAHAKPQGLCNTLT
Full Text Available otein cyanobacterium PCC 7702 MQDIGNILDNRVLLVALVACLVAQTLKLVIELVKNRKLNVRALVTTGGMPSAHSALVTALAAGVGQTRGWASTEFAVATVFAIIVMYDAAGVRQAAGKQARILNQMIDELFDEHPEFTGDRLKELLGHTPFQVIAGSALGITISWLARYAYN
Full Text Available VADRPVAAISADRFELLSAYLDGEVTETERQQVESWLSSDASLRDLHSKLLTLHQEWVSLPAPTEAQPVDALLNGVFSRID...AIEQQEAVTPLVADRAIAAISADRVELLSAYLDGEVTETERQQVESWLSSDASFRDLHSQLLTLHQEWVSLPAPAEAQPVDALLNGVFSRIDEIEQQEAVMPLVGDRA...IAAICADRFELLSAYLDGEVTETERQQVESWLSSDAALRQVHAGLLTLRQELVSLPAPAEAQPVDALLSAVFSQIDAIEQQEAVMPLVGDRAVAAIPAEQFELLSAYLDGEATRAESEQVESWL
Full Text Available ulator Leptolyngbya sp. PCC 7375 MDAHAIAEQLGISAMAVRQHLYALQDEQLIAYEEEPRPMGRPAKLWHLTAAADRFFPEGYAELTLSLIHSVTEAFGADGLERLLDIRT...REQLGAYLAQMDGQDSWQDRLETLADIRTREGYMANVLPQEDGSMLLVENHCPICAAAEACTGLCDRELEIFQIVLGVEVERTEHILSGARRCAYRVVLDENQDE ...
Full Text Available otein, partial Physcomitrella patens PKPKTGGDCEGCADTSELKPPKPDVLGALDTPNAGVLEAPNAGVLEAPNAGVLEAPNAGVLEAPNAGVLEAPNAGVLEAPNAGVLEAPNAGVLEAP...NAGVLEAPNAGVLEAPNAGVLEAPNAGVLEAPNAGVLEAPNAGVLEAPNAGVLEAPNAGVLEAPNAGVLEAPNAGVLEAPNAGVLEAPK
Full Text Available QESLLQEVPLQALTPQEMHVEEVPSQEVSIHEMHTEVSSQEPPSQEMHAQDITPQESHTEEAPSQEMQSDDASQVPPQDIN...TEEVNPQDTHAEEVPSHEVPVQEIHPNEVPSQEVSVQEMQAEEAPAQAMQTEEAPLQEMHSEEAPPEEMHSEEAPPEEMRTEEAPLEEMCTEEAPPEEMWTEEAPPQEIRTEEAPPQEMQMEEAP...PQEMHIEEAPPQEMCREEAPPQEMHTEEAHTEEAPPQEMRTEEAPPQEMQTEEAPPQEMRTEEAPPQEMHAEEAPPQEMHAEEAPPQEMHAEEAPPLEMRAEEAPPLEMHTEEAPHSRDAHRGSTALKRCMPRKHRRKRCAQ ...
Full Text Available KEIAKEYQIEWDTTESENELLKPPEELINGPCTFVSASSMPVKPKPSQPSDLNKPTARSTGTEESPPMHFKDMESAAEAAAESANKAIAAAQAAAYLANKGPNLVTQS...VIREQNILAANEFLELFCELIIARLTIISKQRDCPADLKEGISSLIFAAPRCSDIPELLAIRDNFEKKYGKDFVSAATDLRPSCGVNRMLIDKLSVRTPMGEVKLKVM
Full Text Available YP_473535.1 NC_007775 1117:6436 ... 1118:902 1301283:25411 ... 1129:1600 321327:64 ... bacteriochlorophyll deliv...ery (BCD) family transporter Synechococcus sp. JA-3-3Ab MANIQRDPEVNGSAATALSSQPQPSAP
Full Text Available 7:713 ... 7-cyano-7-deazaguanine reductase Geitlerinema sp. PCC 7105 MSDLHSVSNSISENETAEAATEVKYGEREIQEGKLITFPN...PRVGRRYEIQITLPEFTCKCPFSGYPDFATIEITYVPDERVVELKAIKLYINSYRDRYISHEESANQILDDFVAACDPLEVRLKADFSPRGNVHTVIEVQHVKGDSL
Full Text Available WP_015172098.1 ... 1117:412 ... 1150:57188 1301283:78930 ... 63132:2073 1173025:1687 ... cyanate lyase Geit...lerinema sp. PCC 7407 MSIPEITEKLLAAKKAKDMTFADLEQILGRDEVWIASVIYRQASASEEEAQKIVEALGLGPEVAIALTD
Full Text Available YP_007107568.1 1117:24454 1150:5233 63132:9 1173025:9 3-dehydroquinate dehydratase Geit...lerinema sp. PCC 7407 MLSVLVLHGPNLNLLGLREPEVYGRSTLDDVNRLLEEEARSLQAEITALQSNHEGVLIDAIQAARGKHHGLLINPGAYTHTSVAIRDAIAGVAIPTVEVHLSNIHRREAFRHHSYIAPVAIGQISGFGVESYRLGLRALVAHLQSLDPA ...
Full Text Available WP_015172789.1 ... 1117:22259 ... 1150:58180 1301283:80033 ... 63132:2967 1173025:1771 ... ...LGLWAWERKYEGGTEYTVIPDTLTIGGPATLVGASDRGIEYTAPLQSRYASCVGETLEQPERYYHARFQNGQVTFRVDFTALPSGLYSEITHLNVVNARPYVRWAVVD
Full Text Available YP_007109584.1 NC_019703 1117:412 ... 1150:57188 1301283:78930 ... 63132:2073 1173025:...1687 ... cyanate lyase Geitlerinema sp. PCC 7407 MSIPEITEKLLAAKKAKDMTFADLEQILGRDEVWIASVIYRQASASEEEAQKIVEALGLG
Full Text Available YP_007110409.1 1117:287 1150:4700 63132:1220 1173025:1220 Proto-chlorophyllide reductase 57 kD subunit Geit...lerinema sp. PCC 7407 MSDACQWTPEAEARLKEIPFFVRPAARKKIEKFAQDAGITEITAEVYDQAKQKFGSN ...
Full Text Available YP_007111107.1 1117:5803 1150:2099 63132:201 1173025:201 Methyltransferase type 11 Geit...lerinema sp. PCC 7407 MISSNHVLWQQQHVVDVRDAAFAARTYREHADVRALLRRVTGAERLQSACEVGAGYGRMTVVLTEFAEQVTGLERERHFVEEATRLLPEIT
Full Text Available WP_015173342.1 ... 1117:19759 ... 1150:58278 1301283:80141 ... 63132:3054 1173025:1998 ... ...GRRGDETVITKTAKGYVLWVLEPDAYLDGAEAIAPQAETPAQPTQTSTPCKILDSKSQYRTCHIRVPDLQQRLAALWVDGKYYALFKVVPTVDKAMEITARFGRRGDETVIAKTKKGYSVWVLEPEAYPAPTP
Full Text Available :1456 ... dihydroneopterin aldolase Geitlerinema sp. PCC 7407 MDRLQVCGIRCYGYTGFFEEERKLGQWFEVDLVFGLDVRAAGASDRL...DDTLDYGGAVQRVQSLVRQARFATIERLATAIAEDILHHTVAEEVTVRLTKVSPPIPDFGGHIALEITRSRAELHPGT
Full Text Available WP_017658618.1 NZ_KB235955 1117:22359 ... 1150:57152 1301283:78891 ... 63132:2040 102127:282 ... fibrillin Geit...DELLGIDRFPVYNLGQIYQCIRVKDRKIYNIAELRGIPYLEGMVSVAATFQPTSTRRVDVKFQRFVVGLQRLIGYQNPDAFIEAMESGKKFFAVDFEITPGERSGWLEITYLDEDLRIGRGNVGSVFVLSKV
Full Text Available YP_007109191.1 1117:24406 1150:4823 63132:1446 1173025:1446 dihydroneopterin aldolase Geit...lerinema sp. PCC 7407 MDRLQVCGIRCYGYTGFFEEERKLGQWFEVDLVFGLDVRAAGASDRLDDTLDYGGAVQRVQSLVRQARFATIERLATAIAEDILHHTVAEEVTVRLTKVSPPIPDFGGHIALEITRSRAELHPGT ...
Full Text Available YP_007109048.1 1117:1269 1150:280 63132:1363 1173025:1363 pseudouridine synthase Geit...AEALAQLQQGVTIQDYRTKPAIARLLPEEPPLPPREPPIRYRKEIPTAWLEITLTEGRNRQVRRMTAAVGFPTLRLVRFAIGDLRLEGLAPGQWRDLSAPELAALKSLGKRPVSRARGSRPRGDRS ...
Full Text Available :239 ... 147380:239 ... 4527:239 ... 4530:309 ... 39947:309 Os04g0497700 Oryza sativa Japonica Group MEGDDKSAVVGGAYWGLAARACDACGGEAARLFCRADA...AFLCAGCDARAHGPGSRHARVWLCEVCEHAPAAVTCRADAAALCAACDADIHSANPLARRH
Full Text Available ARSEQSAAAAPPALLPTPPRSKMMPLLPTPCLVILPASFASPSQLAPKPGRADSVARWDAHKAGSVGATASPPTERRAANKGRSNCRADA...CDRWDSNKTASPSRSSTSTSSSSSRASSAERWDAHKKAQLQAGWDAHKAGSVGATASPPTERRAANKGRSNCRADACDRWDSNKTASPSRSSTSTSSSSSRASSAERWDAHKKAQLQAGAVDGEKGSDAANTTSPSGNSKQQYNERREMFAGPSFASLSPEPFMLPLPNFLMAH
Full Text Available ineum MESQAIQDELESLELEIKDVQGQISALIEHQDRLYERKSELKTLLKALAASGSPVASAGSSAIENWSEPFEWDSRADDVRFNIFGISKYRANQKEIINAV...MAGRDVLVIMAAGGGKSLCYQLPAILRGGTTLVVSPLLSLIQDQVMGLAALGISAYMLTSTSGKENEKFVYKALEKGEDDLKI
Full Text Available ineum MESQAIQDELESLELEIKDVQGQISALIEHQDRLYERKSELKTLLKALAASGSPVASAGSSAIENWSEPFEWDSRADDVRFNIFGISKYRANQKEIINAV...MAGRDVLVIMAAGGGKSLCYQLPAILRGGTTLVVSPLLSLIQDQVMGLAALGISAYMLTSTSGKENEKFVYKALEKGEDDLKI
Full Text Available vulgaris MRMYCALFCFWFPLLLLSNLASVFATTQISGNEVNKYGNNAEGHDEESKFKGLFTKPIPFVKPILKPVPVIKPIPKPVPVIKPILKPIPVIKPI...LKPIPVIKPILKPIPVIKPILKPIPVIKPIPKPVPVIKPIPIPVFNKPVPKSISIKPTPKPIPDIKPIPSEEVKFKGLFPKPIPKIKAIPKVIPVVKPI...YIPVVKPLPKIVPIVKPIPILKPKLVPIVKPIPILKPIPKPFTVKKPVPTVESEELLKPKPLFKKYIPKIPFHHQFKKPFVAPLPIHKPIPTP
Full Text Available VNIDGKPLHTKPFPIEEKTLPIPLFKPLPKLVPIVTPIPNPLPIPVFQKPIPKWIPDIKPIPNPIPIPVFPIPKPIPTPVIKKPI...PKPSPIVEKPKPKPKPCPVAEKPIPTPVIKKPIPKPIPIVKPIPKPSPIVKKPLPIPVIKPVPKSTPTVKPVPIPVFKPIPKPFTIVKKPLPIPVIKPVPVPVPIIKPIPIPVFKPIPKPFPIPLFKPI...PKPFPIPFFKPIPKPIPTVKKPQPIPVTKPLPNPVPVVKPIPKPTPMVKKPLPIPVTKPIPKPIPIVKPIPIPVFKPIPKPIPIVKKPLPIPVTKPIPKPVPIIKPI...PIPVFKPVPIIKPIPIPVFKPIPKPFPIVKKPLPIPAIKPIPKPIPSLKKPLPTPTIKPVPEPVPTVKPIPIPTFKPFPKPFPFVKRPLPIPPMMKPIPKPFTSVKKPPPIPISKPIP ...
Full Text Available KTLPIPLFKPLPKLVPIVTPIPNPLPIPVFQKPIPKWIPDIKPIPNPIPIPVFPIPVFKKPIPDVMPIPKPSPIVEKPKPKPCPVAKKPIPTPVIKKPIPKPSPIVEKPKPKPKPCPVAEKPIPTPVIKKPI...PKPIPIVKPIPKPSPIVKKPLPIPVIKPVPKSTPTVKPVPIPVFKPIPKPFTIVKKPLPIPVIKPVPVPVPIIKPIPIPVFKPIPKPFPIPLFKPIPKPFPIPFFKPI...PKPIPTVKKPQPIPVTKPLPNPVPVVKPIPKPTPMVKKPLPIPVTKPIPKPIPIVKPIPIPVFKPIPKPI...PIVKKPLPIPVTKPIPKPVPIIKPIPIPVFKPVPIIKPIPIPVFKPIPKPFPIVKKPLPIPAIKPIPKPIPSLKKPLPTPTIKPVPEPVPTVKPIPIPTFKPFPKPFPFVKRPLPIPPMMKPIPKPFTSVKKPPPIPISKPIP
Full Text Available ursor Glycine max MRMRKQSTMQCALFRFWVPLLLLASFSYAPSVLATTEISGNQVNMDGKVVNEELGKPNLKGQDEEEKFKGLFPKPI...PIVKPFPKLIPIIKPIPKPIPVVKPIPIPLYKPFPKSIPIVKPIPKPIPNGKPIPTEEAKFKGFFPKPIPIVKPIPVAKPIPIVKPFPKLIPIIKPIPKPIPIVKPIPIPVYKPI...PKSIPIVKPIPNGKPIPTEEAKFKGFFLKPIPIVKPIPVAKSIPIVKPIPITVYKPITKSFPTVKPIPKPIPILKPIPKPISIVKPISKPFIVQKPIPAVEPEKFLKPKPFFKKPLPKFPLNPKFKKPLLPPFPIHKAIPTP
Full Text Available vulgaris MRMHSTLQSAVLRFVPLLLVITFCYAVTSAATTQVSGNEVNTNGIASEELTKTKHYEEEKFFHHHKPLFKKPIPFIKPVPKPIPFVKPI...PIYKPYPKPVPVPVYKPIPVYKPVPIYKPIPKPVPVYKPVPIYKPVPKPVPIYKPIPKPVPVYKPIPKPVPVYKPVPIYKPIPKPVPIYKPIPKPVPVYKPIPKPVPVYKPI...PKPVPVYKPIPKPVPVYKPVPIYKPIPKPVPIYKPIPKPVPIYKPIPKPVPVYKPVPIPVFKPIPKPVPFYKPIPVFKPVPTPFPIIKPIPKPVPFYKPIPVFKPVPKPFPIIKPIPKPVPFVKPIPIFKPIPKPLP
Genome wide analysis of orthologous clusters is an important component of comparative genomics studies. Identifying the overlap among orthologous clusters can enable us to elucidate the function and evolution of proteins across multiple species. Here, we report a web platform named OrthoVenn that i...
Mihail R Halachev
Full Text Available Among proteins, orthologs are defined as those that are derived by vertical descent from a single progenitor in the last common ancestor of their host organisms. Our goal is to compute a complete set of protein orthologs derived from all currently available complete bacterial and archaeal genomes. Traditional approaches typically rely on all-against-all BLAST searching which is prohibitively expensive in terms of hardware requirements or computational time (requiring an estimated 18 months or more on a typical server. Here, we present xBASE-Orth, a system for ongoing ortholog annotation, which applies a "divide and conquer" approach and adopts a pragmatic scheme that trades accuracy for speed. Starting at species level, xBASE-Orth carefully constructs and uses pan-genomes as proxies for the full collections of coding sequences at each level as it progressively climbs the taxonomic tree using the previously computed data. This leads to a significant decrease in the number of alignments that need to be performed, which translates into faster computation, making ortholog computation possible on a global scale. Using xBASE-Orth, we analyzed an NCBI collection of 1,288 bacterial and 94 archaeal complete genomes with more than 4 million coding sequences in 5 weeks and predicted more than 700 million ortholog pairs, clustered in 175,531 orthologous groups. We have also identified sets of highly conserved bacterial and archaeal orthologs and in so doing have highlighted anomalies in genome annotation and in the proposed composition of the minimal bacterial genome. In summary, our approach allows for scalable and efficient computation of the bacterial and archaeal ortholog annotations. In addition, due to its hierarchical nature, it is suitable for incorporating novel complete genomes and alternative genome annotations. The computed ortholog data and a continuously evolving set of applications based on it are integrated in the xBASE database, available
Curtis, Darren S.; Phillips, Aaron R.; Callister, Stephen J.; Conlan, Sean; McCue, Lee Ann
At the rate that prokaryotic genomes can now be generated, comparative genomics studies require a flexible method for quickly and accurately predicting orthologs among the rapidly changing set of genomes available. SPOCS implements a graph-based ortholog prediction method to generate a simple tab-delimited table of orthologs and in addition, html files that provide a visualization of the predicted ortholog/paralog relationships to which gene/protein expression metadata may be overlaid. AVAILABILITY AND IMPLEMENTATION: A SPOCS web application is freely available at http://cbb.pnnl.gov/portal/tools/spocs.html. Source code for Linux systems is also freely available under an open source license at http://cbb.pnnl.gov/portal/software/spocs.html; the Boost C++ libraries and BLAST are required.
Identification of four families of yCCR4- and Mg2+-dependent endonuclease-related proteins in higher eukaryotes, and characterization of orthologs of yCCR4 with a conserved leucine-rich repeat essential for hCAF1/hPOP2 binding
Full Text Available Abstract Background The yeast yCCR4 factor belongs to the CCR4-NOT transcriptional regulatory complex, in which it interacts, through its leucine-rich repeat (LRR motif with yPOP2. Recently, yCCR4 was shown to be a component of the major cytoplasmic mRNA deadenylase complex, and to contain a fold related to the Mg2+-dependent endonuclease core. Results Here, we report the identification of nineteen yCCR4-related proteins in eukaryotes (including yeast, plants and animals, which all contain the yCCR4 endonuclease-like fold, with highly conserved CCR4-specific residues. Phylogenetic and genomic analyses show that they form four distinct families, one of which contains the yCCR4 orthologs. The orthologs in animals possess a leucine-rich repeat domain. We show, using two-hybrid and far-Western assays, that the human member binds to the human yPOP2 homologs, i.e. hCAF1 and hPOP2, in a LRR-dependent manner. Conclusions We have identified the mammalian orthologs of yCCR4 and have shown that the human member binds to the human yPOP2 homologs, thus strongly suggesting conservation of the CCR4-NOT complex from yeast to human. All members of the four identified yCCR4-related protein families show stricking conservation of the endonuclease-like catalytic motifs of the yCCR4 C-terminal domain and therefore constitute a new family of potential deadenylases in mammals.
Makarova, Kira; Wolf, Yuri; Koonin, Eugene
With the continuously accelerating genome sequencing from diverse groups of archaea and bacteria, accurate identification of gene orthology and availability of readily expandable clusters of orthologous genes are essential for the functional annotation of new genomes. We report an update of the collection of archaeal Clusters of Orthologous Genes (arCOGs) to cover, on average, 91% of the protein-coding genes in 168 archaeal genomes. The new arCOGs were constructed using refined algorithms for...
Lafond, Manuel; El-Mabrouk, Nadia
Background A variety of methods based on sequence similarity, reconciliation, synteny or functional characteristics, can be used to infer orthology and paralogy relations between genes of a given gene family G . But is a given set C of orthology/paralogy constraints possible, i.e., can they simultaneously co-exist in an evolutionary history for G ? While previous studies have focused on full sets of constraints, here we consider the general case where C does not necessarily involve a ...
Balaji, Jayashree; Crouch, Jonathan H; Petite, Prasad V N S; Hoisington, David A
A minimal requirement to initiate a comparative genomics study on plant responses to abiotic stresses is a dataset of orthologous sequences. The availability of a large amount of sequence information, including those derived from stress cDNA libraries allow for the identification of stress related genes and orthologs associated with the stress response. Orthologous sequences serve as tools to explore genes and their relationships across species. For this purpose, ESTs from stress cDNA libraries across 16 crop species including 6 important cereal crops and 10 dicots were systematically collated and subjected to bioinformatics analysis such as clustering, grouping of tentative orthologous sets, identification of protein motifs/patterns in the predicted protein sequence, and annotation with stress conditions, tissue/library source and putative function. All data are available to the scientific community at http://intranet.icrisat.org/gt1/tog/homepage.htm. We believe that the availability of annotated plant abiotic stress ortholog sets will be a valuable resource for researchers studying the biology of environmental stresses in plant systems, molecular evolution and genomics.
Huynen Martijn A
Full Text Available Abstract Background Orthology is one of the cornerstones of gene function prediction. Dividing the phylogenetic relations between genes into either orthologs or paralogs is however an oversimplification. Already in two-species gene-phylogenies, the complicated, non-transitive nature of phylogenetic relations results in inparalogs and outparalogs. For situations with more than two species we lack semantics to specifically describe the phylogenetic relations, let alone to exploit them. Published procedures to extract orthologous groups from phylogenetic trees do not allow identification of orthology at various levels of resolution, nor do they document the relations between the orthologous groups. Results We introduce "levels of orthology" to describe the multi-level nature of gene relations. This is implemented in a program LOFT (Levels of Orthology From Trees that assigns hierarchical orthology numbers to genes based on a phylogenetic tree. To decide upon speciation and gene duplication events in a tree LOFT can be instructed either to perform classical species-tree reconciliation or to use the species overlap between partitions in the tree. The hierarchical orthology numbers assigned by LOFT effectively summarize the phylogenetic relations between genes. The resulting high-resolution orthologous groups are depicted in colour, facilitating visual inspection of (large trees. A benchmark for orthology prediction, that takes into account the varying levels of orthology between genes, shows that the phylogeny-based high-resolution orthology assignments made by LOFT are reliable. Conclusion The "levels of orthology" concept offers high resolution, reliable orthology, while preserving the relations between orthologous groups. A Windows as well as a preliminary Java version of LOFT is available from the LOFT website http://www.cmbi.ru.nl/LOFT.
Poietti, S.; Bertini, L.; Ent, S. van der; Leon Reyes, H.A.; Pieterse, C.M.J.; Tucci, M.; Caporale, C.; Caruso, C.
WRKY proteins are transcription factors involved in many plant processes including plant responses to pathogens. Here, the cross activity of TaWRKY78 from the monocot wheat and AtWRKY20 from the dicot Arabidopsis on the cognate promoters of the orthologous PR4-type genes wPR4e and AtHEL of wheat and
Coumou, Jeroen; Wagemakers, Alex; Trentelman, Jos J.; Nijhof, Ard M.; Hovius, Joppe W.
Human tick-borne diseases that are transmitted by Ixodes ricinus, such as Lyme borreliosis and tick borne encephalitis, are on the rise in Europe. Diminishing I. ricinus populations in nature can reduce tick exposure to humans, and one way to do so is by developing an anti-vector vaccine against
Lechner, Marcus; Findeiss, Sven; Steiner, Lydia; Marz, Manja; Stadler, Peter F; Prohaska, Sonja J
Orthology analysis is an important part of data analysis in many areas of bioinformatics such as comparative genomics and molecular phylogenetics. The ever-increasing flood of sequence data, and hence the rapidly increasing number of genomes that can be compared simultaneously, calls for efficient software tools as brute-force approaches with quadratic memory requirements become infeasible in practise. The rapid pace at which new data become available, furthermore, makes it desirable to compute genome-wide orthology relations for a given dataset rather than relying on relations listed in databases. The program Proteinortho described here is a stand-alone tool that is geared towards large datasets and makes use of distributed computing techniques when run on multi-core hardware. It implements an extended version of the reciprocal best alignment heuristic. We apply Proteinortho to compute orthologous proteins in the complete set of all 717 eubacterial genomes available at NCBI at the beginning of 2009. We identified thirty proteins present in 99% of all bacterial proteomes. Proteinortho significantly reduces the required amount of memory for orthology analysis compared to existing tools, allowing such computations to be performed on off-the-shelf hardware.
Full Text Available Abstract Background Orthology analysis is an important part of data analysis in many areas of bioinformatics such as comparative genomics and molecular phylogenetics. The ever-increasing flood of sequence data, and hence the rapidly increasing number of genomes that can be compared simultaneously, calls for efficient software tools as brute-force approaches with quadratic memory requirements become infeasible in practise. The rapid pace at which new data become available, furthermore, makes it desirable to compute genome-wide orthology relations for a given dataset rather than relying on relations listed in databases. Results The program Proteinortho described here is a stand-alone tool that is geared towards large datasets and makes use of distributed computing techniques when run on multi-core hardware. It implements an extended version of the reciprocal best alignment heuristic. We apply Proteinortho to compute orthologous proteins in the complete set of all 717 eubacterial genomes available at NCBI at the beginning of 2009. We identified thirty proteins present in 99% of all bacterial proteomes. Conclusions Proteinortho significantly reduces the required amount of memory for orthology analysis compared to existing tools, allowing such computations to be performed on off-the-shelf hardware.
Kriventseva, Evgenia V; Tegenfeldt, Fredrik; Petty, Tom J; Waterhouse, Robert M; Simão, Felipe A; Pozdnyakov, Igor A; Ioannidis, Panagiotis; Zdobnov, Evgeny M
Orthology, refining the concept of homology, is the cornerstone of evolutionary comparative studies. With the ever-increasing availability of genomic data, inference of orthology has become instrumental for generating hypotheses about gene functions crucial to many studies. This update of the OrthoDB hierarchical catalog of orthologs (http://www.orthodb.org) covers 3027 complete genomes, including the most comprehensive set of 87 arthropods, 61 vertebrates, 227 fungi and 2627 bacteria (sampling the most complete and representative genomes from over 11,000 available). In addition to the most extensive integration of functional annotations from UniProt, InterPro, GO, OMIM, model organism phenotypes and COG functional categories, OrthoDB uniquely provides evolutionary annotations including rates of ortholog sequence divergence, copy-number profiles, sibling groups and gene architectures. We re-designed the entirety of the OrthoDB website from the underlying technology to the user interface, enabling the user to specify species of interest and to select the relevant orthology level by the NCBI taxonomy. The text searches allow use of complex logic with various identifiers of genes, proteins, domains, ontologies or annotation keywords and phrases. Gene copy-number profiles can also be queried. This release comes with the freely available underlying ortholog clustering pipeline (http://www.orthodb.org/software). © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Lechner, Marcus; Hernandez-Rosales, Maribel; Doerr, Daniel; Wieseke, Nicolas; Thévenin, Annelyse; Stoye, Jens; Hartmann, Roland K.; Prohaska, Sonja J.; Stadler, Peter F.
The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the ...
Zielezinski, Andrzej; Dziubek, Michal; Sliski, Jan; Karlowski, Wojciech M
ORCAN (ORtholog sCANner) is a web-based meta-server for one-click evolutionary and functional annotation of protein sequences. The server combines information from the most popular orthology-prediction resources, including four tools and four online databases. Functional annotation utilizes five additional comparisons between the query and identified homologs, including: sequence similarity, protein domain architectures, functional motifs, Gene Ontology term assignments and a list of associated articles. Furthermore, the server uses a plurality-based rating system to evaluate the orthology relationships and to rank the reference proteins by their evolutionary and functional relevance to the query. Using a dataset of ∼1 million true yeast orthologs as a sample reference set, we show that combining multiple orthology-prediction tools in ORCAN increases the sensitivity and precision by 1-2 percent points. The service is available for free at http://www.combio.pl/orcan/ . email@example.com. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: firstname.lastname@example.org
Dessimoz, Christophe; Boeckmann, Brigitte; Roth, Alexander C J; Gonnet, Gaston H
Correct orthology assignment is a critical prerequisite of numerous comparative genomics procedures, such as function prediction, construction of phylogenetic species trees and genome rearrangement analysis. We present an algorithm for the detection of non-orthologs that arise by mistake in current orthology classification methods based on genome-specific best hits, such as the COGs database. The algorithm works with pairwise distance estimates, rather than computationally expensive and error-prone tree-building methods. The accuracy of the algorithm is evaluated through verification of the distribution of predicted cases, case-by-case phylogenetic analysis and comparisons with predictions from other projects using independent methods. Our results show that a very significant fraction of the COG groups include non-orthologs: using conservative parameters, the algorithm detects non-orthology in a third of all COG groups. Consequently, sequence analysis sensitive to correct orthology assignments will greatly benefit from these findings.
Nichio, Bruno T L; Marchaukoski, Jeroniza Nunes; Raittz, Roberto Tadeu
Nowadays defying homology relationships among sequences is essential for biological research. Within homology the analysis of orthologs sequences is of great importance for computational biology, annotation of genomes and for phylogenetic inference. Since 2007, with the increase in the number of new sequences being deposited in large biological databases, researchers have begun to analyse computerized methodologies and tools aimed at selecting the most promising ones in the prediction of orthologous groups. Literature in this field of research describes the problems that the majority of available tools show, such as those encountered in accuracy, time required for analysis (especially in light of the increasing volume of data being submitted, which require faster techniques) and the automatization of the process without requiring manual intervention. Conducting our search through BMC, Google Scholar, NCBI PubMed, and Expasy, we examined more than 600 articles pursuing the most recent techniques and tools developed to solve most the problems still existing in orthology detection. We listed the main computational tools created and developed between 2011 and 2017, taking into consideration the differences in the type of orthology analysis, outlining the main features of each tool and pointing to the problems that each one tries to address. We also observed that several tools still use as their main algorithm the BLAST "all-against-all" methodology, which entails some limitations, such as limited number of queries, computational cost, and high processing time to complete the analysis. However, new promising tools are being developed, like OrthoVenn (which uses the Venn diagram to show the relationship of ortholog groups generated by its algorithm); or proteinOrtho (which improves the accuracy of ortholog groups); or ReMark (tackling the integration of the pipeline to turn the entry process automatic); or OrthAgogue (using algorithms developed to minimize processing
Bruno T. L. Nichio
Full Text Available Nowadays defying homology relationships among sequences is essential for biological research. Within homology the analysis of orthologs sequences is of great importance for computational biology, annotation of genomes and for phylogenetic inference. Since 2007, with the increase in the number of new sequences being deposited in large biological databases, researchers have begun to analyse computerized methodologies and tools aimed at selecting the most promising ones in the prediction of orthologous groups. Literature in this field of research describes the problems that the majority of available tools show, such as those encountered in accuracy, time required for analysis (especially in light of the increasing volume of data being submitted, which require faster techniques and the automatization of the process without requiring manual intervention. Conducting our search through BMC, Google Scholar, NCBI PubMed, and Expasy, we examined more than 600 articles pursuing the most recent techniques and tools developed to solve most the problems still existing in orthology detection. We listed the main computational tools created and developed between 2011 and 2017, taking into consideration the differences in the type of orthology analysis, outlining the main features of each tool and pointing to the problems that each one tries to address. We also observed that several tools still use as their main algorithm the BLAST “all-against-all” methodology, which entails some limitations, such as limited number of queries, computational cost, and high processing time to complete the analysis. However, new promising tools are being developed, like OrthoVenn (which uses the Venn diagram to show the relationship of ortholog groups generated by its algorithm; or proteinOrtho (which improves the accuracy of ortholog groups; or ReMark (tackling the integration of the pipeline to turn the entry process automatic; or OrthAgogue (using algorithms developed to
Heijden, R.T.J.M. van der; Snel, B.; Noort, V. van; Huynen, M.A.
BACKGROUND: Orthology is one of the cornerstones of gene function prediction. Dividing the phylogenetic relations between genes into either orthologs or paralogs is however an oversimplification. Already in two-species gene-phylogenies, the complicated, non-transitive nature of phylogenetic
Wolf Yuri I
Full Text Available Abstract Background An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs. Rapid accumulation of genome sequences creates opportunities for refining COGs but also represents a challenge because of error amplification. One of the practical strategies involves construction of refined COGs for phylogenetically compact subsets of genomes. Results New Archaeal Clusters of Orthologous Genes (arCOGs were constructed for 41 archaeal genomes (13 Crenarchaeota, 27 Euryarchaeota and one Nanoarchaeon using an improved procedure that employs a similarity tree between smaller, group-specific clusters, semi-automatically partitions orthology domains in multidomain proteins, and uses profile searches for identification of remote orthologs. The annotation of arCOGs is a consensus between three assignments based on the COGs, the CDD database, and the annotations of homologs in the NR database. The 7538 arCOGs, on average, cover ~88% of the genes in a genome compared to a ~76% coverage in COGs. The finer granularity of ortholog identification in the arCOGs is apparent from the fact that 4538 arCOGs correspond to 2362 COGs; ~40% of the arCOGs are new. The archaeal gene core (protein-coding genes found in all 41 genome consists of 166 arCOGs. The arCOGs were used to reconstruct gene loss and gene gain events during archaeal evolution and gene sets of ancestral forms. The Last Archaeal Common Ancestor (LACA is conservatively estimated to possess 996 genes compared to 1245 and 1335 genes for the last common ancestors of Crenarchaeota and Euryarchaeota, respectively. It is inferred that LACA was a chemoautotrophic hyperthermophile
Full Text Available BACKGROUND: Accurate identification of orthologs is crucial for evolutionary studies and for functional annotation. Several algorithms have been developed for ortholog delineation, but so far, manually curated genome-scale biological databases of orthologous genes for algorithm evaluation have been lacking. We evaluated four popular ortholog prediction algorithms (MultiParanoid; and OrthoMCL; RBH: Reciprocal Best Hit; RSD: Reciprocal Smallest Distance; the last two extended into clustering algorithms cRBH and cRSD, respectively, so that they can predict orthologs across multiple taxa against a set of 2,723 groups of high-quality curated orthologs from 6 Saccharomycete yeasts in the Yeast Gene Order Browser. RESULTS: Examination of sensitivity [TP/(TP+FN], specificity [TN/(TN+FP], and accuracy [(TP+TN/(TP+TN+FP+FN] across a broad parameter range showed that cRBH was the most accurate and specific algorithm, whereas OrthoMCL was the most sensitive. Evaluation of the algorithms across a varying number of species showed that cRBH had the highest accuracy and lowest false discovery rate [FP/(FP+TP], followed by cRSD. Of the six species in our set, three descended from an ancestor that underwent whole genome duplication. Subsequent differential duplicate loss events in the three descendants resulted in distinct classes of gene loss patterns, including cases where the genes retained in the three descendants are paralogs, constituting 'traps' for ortholog prediction algorithms. We found that the false discovery rate of all algorithms dramatically increased in these traps. CONCLUSIONS: These results suggest that simple algorithms, like cRBH, may be better ortholog predictors than more complex ones (e.g., OrthoMCL and MultiParanoid for evolutionary and functional genomics studies where the objective is the accurate inference of single-copy orthologs (e.g., molecular phylogenetics, but that all algorithms fail to accurately predict orthologs when paralogy
Cenci, Albero; Guignon, Valentin; Roux, Nicolas; Rouard, Mathieu
Identifying the molecular mechanisms underlying tolerance to abiotic stresses is important in crop breeding. A comprehensive understanding of the gene families associated with drought tolerance is therefore highly relevant. NAC transcription factors form a large plant-specific gene family involved in the regulation of tissue development and responses to biotic and abiotic stresses. The main goal of this study was to set up a framework of orthologous groups determined by an expert sequence comparison of NAC genes from both monocots and dicots. In order to clarify the orthologous relationships among NAC genes of different species, we performed an in-depth comparative study of four divergent taxa, in dicots and monocots, whose genomes have already been completely sequenced: Arabidopsis thaliana, Vitis vinifera, Musa acuminata and Oryza sativa. Due to independent evolution, NAC copy number is highly variable in these plant genomes. Based on an expert NAC sequence comparison, we propose forty orthologous groups of NAC sequences that were probably derived from an ancestor gene present in the most recent common ancestor of dicots and monocots. These orthologous groups provide a curated resource for large-scale protein sequence annotation of NAC transcription factors. The established orthology relationships also provide a useful reference for NAC function studies in newly sequenced genomes such as M. acuminata and other plant species.
Gains and losses shape the gene complement of animal lineages and are a fundamental aspect of genomic evolution. Acquiring a comprehensive view of the evolution of gene repertoires is limited by the intrinsic limitations of common sequence similarity searches and available databases. Thus, a subset of the gene complement of an organism consists of hidden orthologs, i.e., those with no apparent homology to sequenced animal lineages—mistakenly considered new genes—but actually representing rapidly evolving orthologs or undetected paralogs. Here, we describe Leapfrog, a simple automated BLAST pipeline that leverages increased taxon sampling to overcome long evolutionary distances and identify putative hidden orthologs in large transcriptomic databases by transitive homology. As a case study, we used 35 transcriptomes of 29 flatworm lineages to recover 3427 putative hidden orthologs, some unidentified by OrthoFinder and HaMStR, two common orthogroup inference algorithms. Unexpectedly, we do not observe a correlation between the number of putative hidden orthologs in a lineage and its “average” evolutionary rate. Hidden orthologs do not show unusual sequence composition biases that might account for systematic errors in sequence similarity searches. Instead, gene duplication with divergence of one paralog and weak positive selection appear to underlie hidden orthology in Platyhelminthes. By using Leapfrog, we identify key centrosome-related genes and homeodomain classes previously reported as absent in free-living flatworms, e.g., planarians. Altogether, our findings demonstrate that hidden orthologs comprise a significant proportion of the gene repertoire in flatworms, qualifying the impact of gene losses and gains in gene complement evolution. PMID:28400424
Goodstadt, Leo; Ponting, Chris P
Accurate predictions of orthology and paralogy relationships are necessary to infer human molecular function from experiments in model organisms. Previous genome-scale approaches to predicting these relationships have been limited by their use of protein similarity and their failure to take into account multiple splicing events and gene prediction errors. We have developed PhyOP, a new phylogenetic orthology prediction pipeline based on synonymous rate estimates, which accurately predicts orthology and paralogy relationships for transcripts, genes, exons, or genomic segments between closely related genomes. We were able to identify orthologue relationships to human genes for 93% of all dog genes from Ensembl. Among 1:1 orthologues, the alignments covered a median of 97.4% of protein sequences, and 92% of orthologues shared essentially identical gene structures. PhyOP accurately recapitulated genomic maps of conserved synteny. Benchmarking against predictions from Ensembl and Inparanoid showed that PhyOP is more accurate, especially in its predictions of paralogy. Nearly half (46%) of PhyOP paralogy predictions are unique. Using PhyOP to investigate orthologues and paralogues in the human and dog genomes, we found that the human assembly contains 3-fold more gene duplications than the dog. Species-specific duplicate genes, or "in-paralogues," are generally shorter and have fewer exons than 1:1 orthologues, which is consistent with selective constraints and mutation biases based on the sizes of duplicated genes. In-paralogues have experienced elevated amino acid and synonymous nucleotide substitution rates. Duplicates possess similar biological functions for either the dog or human lineages. Having accounted for 2,954 likely pseudogenes and gene fragments, and after separating 346 erroneously merged genes, we estimated that the human genome encodes a minimum of 19,700 protein-coding genes, similar to the gene count of nematode worms. PhyOP is a fast and robust
Full Text Available Accurate predictions of orthology and paralogy relationships are necessary to infer human molecular function from experiments in model organisms. Previous genome-scale approaches to predicting these relationships have been limited by their use of protein similarity and their failure to take into account multiple splicing events and gene prediction errors. We have developed PhyOP, a new phylogenetic orthology prediction pipeline based on synonymous rate estimates, which accurately predicts orthology and paralogy relationships for transcripts, genes, exons, or genomic segments between closely related genomes. We were able to identify orthologue relationships to human genes for 93% of all dog genes from Ensembl. Among 1:1 orthologues, the alignments covered a median of 97.4% of protein sequences, and 92% of orthologues shared essentially identical gene structures. PhyOP accurately recapitulated genomic maps of conserved synteny. Benchmarking against predictions from Ensembl and Inparanoid showed that PhyOP is more accurate, especially in its predictions of paralogy. Nearly half (46% of PhyOP paralogy predictions are unique. Using PhyOP to investigate orthologues and paralogues in the human and dog genomes, we found that the human assembly contains 3-fold more gene duplications than the dog. Species-specific duplicate genes, or "in-paralogues," are generally shorter and have fewer exons than 1:1 orthologues, which is consistent with selective constraints and mutation biases based on the sizes of duplicated genes. In-paralogues have experienced elevated amino acid and synonymous nucleotide substitution rates. Duplicates possess similar biological functions for either the dog or human lineages. Having accounted for 2,954 likely pseudogenes and gene fragments, and after separating 346 erroneously merged genes, we estimated that the human genome encodes a minimum of 19,700 protein-coding genes, similar to the gene count of nematode worms. PhyOP is a