郑西一中初中:寻找该蛋白序列可能的蛋白质结构域和生物功能位点。

来源:百度文库 编辑:中科新闻网 时间:2024/04/29 08:34:57
序列为:
MMATQHTQYPDARLSSPIVLDQCDLVTRACGLYSEYSLNPKLRTCRLPKHIYRLKYDAIV
LRFISDVPVATIPIDYIAPMLINVLADSKNAPLEPPCLSFLDEIVNYTVQDAAFLNYYMN
QIKTQEGVITDQLKQNIRRVIHKNRYLSALFFWHDLSILTRRGRMNRGNVRSTWFVTNEV
VDILGYGDYIFWKIPIALLPMNSANVPHASTDWYQPNIFKEAIQGHTHIISVSTAEVLIM
CKDLVTSRFNTLLIAELARLEDPVSADYPLVDDIQSLYNAGDYLLSILGSEGYQIIKYLE
PLCLAKIQLCSQYTERKGRFLTQMHLAVIQTLRELLLNRGLKKSQLSKIREFHQLLLRLR
STPQQLCELFSIQKHWGHPVLHSEKAIQKVKNHATVLKALRPIIIFETYCVFKYSVAKHF
FDSQGTWYSVISDRCLTPGLNSYIRRNQFPPLPMIKDLLWEFYHLDHPPLFSTKIISDLS
IFIKDRATAVEQTCWDAVFEPNVLGYSPPYRFNTKRVPEQFLEQEDFSIESVLQYAQELR
YLLPQNRNFSFSLKEKELNVGRTFGKLPYLTRNVQTLCEALLADGLAKAFPSNMMVVTER
EQKESLLHQASWHHTSDDFGEHATVRGSSFVTDLEKYNLAFRYEFTAPFIKYCNQCYGVR
NVFDWMHFLIPQCYMHVSDYYNPPHNVTLENREYPPEGPSAYRGHLGGIEGLQQKLWTSI
SCAQISLVEIKTGFKLRSAVMGDNQCITVLSVFPLESSPNEQERCAEDNAARVAASLAKV
TSACGIFLKPDETFVHSGFIYFGPKQYLNGIQLPQSLKTAARMAPLSDAIFDDLQGTLAS
IGTAFERSISETRHILPSRVAAAFHTYFSVRILQHHHLGFHKGSDLGQLAINKPLDFGTI
ALSLAVPQVLGGLSFLNPEKCLYRNLGDPVTSGLFQLKHYLSMVGMSDIFHALVAKSPGN
CSAIDFVLNPGGLNVPGSQDLTSFLRQIVRRSITLSARNKLINTLFHASADLEDELVCKW
LLSSTPVMSRFAADIFSRTPSGKRLQILGYLEGTRTLLASKMISNNAETPILERLRKITL
QRWNLWFSYLDHCDSALMEAIQPIRCTVDIAQILREYSWAHILGGRQLIGATLPCIPEQF
QTTWLKPYEQCVECSSTNNSSPYVSVALKRNVVSAWPDASRLGWTIGDGIPYIGSRTEDK
IGQPAIKPRCPSAALREAIELTSRLTWVTQGSANSDQLIRPFLEARVNLSVQEILQMTPS
HYSGNIVHRYNDQYSPHSFMANRMSNTATRLMVSTNTLGEFSGGGQAARDSNIIFQNVIN
FAVALYDIRFRNTCTSSIQYHRAHIHLTDCCTREVPAQYLTYTTTLNLDLSKYRNNELIY
DSEPLRGGLNCNLSIDSPLMKGPRLNIIEDDLIRLPHLSGWELAKTVLQSIISDSSNSST
DPISSGETRSFTTHFLTYPKIGLLYSFGALISFYLGNTILCTKKIGLTEFLYYLQNQIHN
LSHRSLRIFKPTFRHSSVMSRLMDIDPNFSIYIGGTAGDRGLSDAARLFLRIAISTFLSF
VEEWVIFRKANIPLWVVYPLEGQRPDPPGEFLNRVKSLIVGIEDDKNKGSILSRSEEKCS
SNLVYNCKSTASNFFHASLAYWRGRHRPKKTIGATKATTAPHIILPLGNSDRPPGLDLNQ
SNDTFIPTRIKQIVQGDSRNDRTTTTRLPPQSRSTPTSATEPPTKIYEGSTTYRGKSTDT
HLDEGHNAKEFPFNPHRLVVPFFKLTKDGEYSIEPSPEESRSNIKGLLQHLRTMVDTTIY
CRFTGIVSSMHYKLDEVLWEYNKFESAVTLAEGEGSGALLLIQKYGVKKLFLNTLATEHS
IESEVISGYTTPRMLLSVMPRTHRGELEVILNNSASQITDITHRDWFSNQKNRIPNDVDI
ITMDAETTENLDRSRLYEAVYTIICNHINPKTLKVVILKVFLSDLDGMCWINNYLAPMFG
SGYLIKPITSSARSSEWYLCLSNLLSTLRTTQHQTQANCLHVVQCALQQQVQRGSYWLSH
LTKYTTSRLHNSYIAFGFPSLEKVLYHRYNLVDSRNGPLVSITRHLALLQTEIRELVTDY
NQLRQSRTQTYHFIKTSKGRITKLVNDYLRFELVIRALKNNSTWHHELYLLPELIGVCHR
FNHTRNCTCSERFLVQTLYLHRMSDAEIKLMDRLTSLVNMFPEGFRSSSV

去这里吧,专业分析。http://www.compbio.dundee.ac.uk/~www-jpred/
步骤:
1.1 进入JPred http://www.compbio.dundee.ac.uk/~www-jpred/
1.2 点击Prediction(Submit a protein sequence for secondary structure prediction)
1.3 选择Email结果提交方式(建议)或留空为网页结果显示
1.4 输入蛋白质序列(原始序列)
1.5 选择File format的三个参数
1.6 点击Run提交
1.7 在邮箱中找到结果地址,并在弹出的结果显示界面选择第3项(Your results in HTML can be found here. )、第4项(A simple display of your query sequence and the prediction can be found here.)进行简单结果浏览、第5项(Postscript output can be found here.) 进行图形化输出

PS00005 PKC_PHOSPHO_SITE Protein kinase C phosphorylation site :
44 - 46: TcR

160 - 162: TrR

246 - 248: TsR

314 - 316: TeR

331 - 333: TlR

383 - 385: SeK

432 - 434: SdR

472 - 474: StK

514 - 516: TkR

552 - 554: SlK

598 - 600: TeR

624 - 626: TvR

816 - 818: SlK

869 - 871: SvR

996 - 998: SaR

1041 - 1043: SgK

1222 - 1224: TsR

1482 - 1484: TkK

1502 - 1504: ShR

1505 - 1507: SlR

1512 - 1514: TfR

1670 - 1672: SdR

1705 - 1707: TtR

1732 - 1734: TyR

1871 - 1873: TpR

1882 - 1884: ThR

1902 - 1904: ThR

1952 - 1954: TlK

1991 - 1993: SaR

2007 - 2009: TlR

2046 - 2048: TsR

2116 - 2118: TsK

2170 - 2172: SeR

PS00006 CK2_PHOSPHO_SITE Casein kinase II phosphorylation site :
99 - 102: SflD

108 - 111: TvqD

233 - 236: StaE

331 - 334: TlrE

488 - 491: TavE

493 - 496: TcwD

552 - 555: SlkE

576 - 579: TlcE

598 - 601: TerE

615 - 618: TsdD

632 - 635: TdlE

726 - 729: SlvE

758 - 761: SpnE

843 - 846: TafE

848 - 851: SisE

962 - 965: SaiD

1088 - 1091: SylD

1185 - 1188: TigD

1195 - 1198: SrtE

1250 - 1253: SvqE

1297 - 1300: TlgE

1419 - 1422: SgwE

1438 - 1441: SstD

1444 - 1447: SsgE

1536 - 1539: TagD

1559 - 1562: SfvE

1613 - 1616: SrsE

1698 - 1701: SrnD

1718 - 1721: SatE

1740 - 1743: ThlD

1776 - 1779: SpeE

1793 - 1796: TmvD

1829 - 1832: TlaE

1902 - 1905: ThrD

1963 - 1966: SdlD

2168 - 2171: TcsE

2184 - 2187: SdaE

PS00001 ASN_GLYCOSYLATION N-glycosylation site :
106 - 109: NYTV

548 - 551: NFSF

686 - 689: NVTL

960 - 963: NCSA

1158 - 1161: NNSS

1248 - 1251: NLSV

1392 - 1395: NLSI

1437 - 1440: NSST

1500 - 1503: NLSH

1528 - 1531: NFSI

1679 - 1682: NQSN

1682 - 1685: NDTF

1892 - 1895: NNSA

2140 - 2143: NNST

2141 - 2144: NSTW

2162 - 2165: NHTR

2166 - 2169: NCTC

PS00008 MYRISTYL N-myristoylation site :
168 - 173: GNvrST

340 - 345: GLkkSQ

425 - 430: GTwySV

585 - 590: GLakAF

707 - 712: GGieGL

836 - 841: GTlaSI

883 - 888: GSdlGQ

959 - 964: GNcsAI

1231 - 1236: GSanSD

1303 - 1308: GGgqAA

1304 - 1309: GGqaAR

1387 - 1392: GGlnCN

1388 - 1393: GLncNL

1462 - 1467: GLlySF

1468 - 1473: GAliSF

1534 - 1539: GGtaGD

1541 - 1546: GLsdAA

1609 - 1614: GSilSR

1653 - 1658: GAtkAT

1675 - 1680: GLdlNQ

1805 - 1810: GIvsSM

PS00003 SULFATION Tyrosine sulfation site :
261 - 275:

edpvsadYplvddiq687 - 701:

vtlenreYppegpsa1373 - 1387:

yrnneliYdseplrg1764 - 1778:

kltkdgeYsiepspePS00009 AMIDATION Amidation site :
1041 - 1044: sGKR

PS00004 CAMP_PHOSPHO_SITE cAMP- and cGMP-dependent protein kinase phosphorylation site :
1076 - 1079: RKiT

PS00007 TYR_PHOSPHO_SITE Tyrosine kinase phosphorylation site :
1374 - 1380: Rnn.Eli.Y

1764 - 1771: KltkDge.Y

1792 - 1800: RtmvDttiY

1935 - 1941: Rly.Eav.Y

PS00029 LEUCINE_ZIPPER Leucine zipper pattern :
1480 - 1501: LctkkigLteflyyLqnqihnL

这个问题还真有点意思。NCBI根据序列相似性,注释为“RNA-dependent RNA polymerase”但却没有相关结构域的注释。