Proteomics Identifications Database
The PRIDE (PRoteomics IDEntifications database) is a public data repository of mass spectrometry-based proteomics data, and is maintained by the European Bioinformatics Institute as part of the Proteomics Team.[1]
Originally designed by Lennart Martens in 2003 during a stay at the European Bioinformatics Institute as a Marie Curie fellow of the European Commission in the "Quality of Life" Programme (Contract number: QLRI-1999-50595), PRIDE was established as a production service in 2005.[2] The original grant application document from June 2013 to start construction of PRIDE has since been published in a viewpoint article.[3][4] Several similar proteomics databases have been built, including the GPMDB, PeptideAtlas, Proteinpedia and the NCBI Peptidome.[1]
The PRIDE database constitutes a structured data repository, and stores the original experimental data from the researchers without editorial control over the submitted data.
In total, PRIDE contains data from about 60 species, the biggest fraction of it coming from human samples (including the data from the two draft human proteomes[5][6]) followed by the fruit fly Drosophila melanogaster and mouse.[1]
Formats and the submission process
Since detailed proteomics data currently cannot be curated from the existing literature, the source of PRIDE data is solely submissions by academic researchers.
PRIDE is a standards-compliant public repository, meaning that its own XML-based data exchange format for submissions, PRIDE XML, was built around the Proteomics Standards Initiative mzData standard for mass spectrometry. Recently, PRIDE has been adapted to work with the modern mzML[7] and mzIdentML[8] standards of the Proteomics Standards Initiative.[9] An additional format, dubbed mzTab, can be used as a simplified way to submit quantitative proteomics data.[10]
As there are many types of different mass spectrometry instruments and software formats are currently on the market, wet-lab scientists without a strong bioinformatics background or informatics support were having problems converting their data to PRIDE XML. The development of PRIDE Converter helped to tackle this situation.[11] PRIDE Converter is a tool, written in the Java programming language, that converts 15 different input mass spectrometry data formats into PRIDE XML via a wizard-like graphical user interface. It is freely available and is open source under the permissive Apache License. A new version of PRIDE Converter was released in 2012 as PRIDE Converter 2.[12] This new version constituted a complete rewrite, focused on easy adaptability to different (and evolving) data sources.
Browsing, searching and data mining PRIDE
Currently, data can be queried from PRIDE via the PRIDE web interface, through the stand-alone Java client PRIDE Inspector,[13] or coupled directly to several search engines through PeptideShaker.[14] Moreover, a new RESTful API allows convenient programmatic access to the PRIDE archive.[15]
The extensive use of controlled vocabularies (CVs) and ontologies for flexible yet context-sensitive annotation of data, along with the ability to perform intelligent queries by these annotations, are key features of PRIDE.[16]
Involvement in ProteomeXchange
The ProteomeXchange consortium has been set up to provide a coordinated submission of MS proteomics data to the main existing proteomics repositories, and to encourage optimal data dissemination.[17] The consortium contains several member databases, including PRIDE and PeptideAtlas. The earliest conception of ProteomeXchange stems from a meeting at the HUPO 2005 conference in Munich,[18] where the main proteomics data repositories at the time agreed in principle to exchange their data, and thus provide a means for the user to find public proteomics data at any of the participating databases. Due to the rapid development of the field, and the need to first develop suitable standards for data exchange, it took almost ten years from that meeting to actually implement this system, an effort that was funded by the 'ProteomeXchange' Coordination Action grant of the European Commission's Seventh Framework Programme.[19]
Data recovery after the discontinuation of Peptidome
The NCBI Peptidome database was discontinued in 2011, yet a joint effort by the PRIDE and Peptidome teams resulted in the transfer of all Peptidome data to PRIDE.[20][21][22]
References
- ^ a b c Vizcaíno, JA; Côté, R; Reisinger, F; Barsnes, H; Foster, JM; Rameseder, J; Hermjakob, H; Martens, L (2010). "The Proteomics Identifications database: 2010 update". Nucleic Acids Res. 38 (Database): D736–42. doi:10.1093/nar/gkp964. PMC 2808904. PMID 19906717.
- ^ Martens, L; Hermjakob, H; Jones, P; Adamski, M; Taylor, C; States, D; Gevaert, K; Vandekerckhove, J; Apweiler, R (Aug 2005). "PRIDE: The PRoteomics IDEntifications database". Proteomics. 5 (13): 3537–45. doi:10.1002/pmic.200401303. PMID 16041671. S2CID 28998489.
- ^ "Application for Training at the EMBL-EBI EU Marie Curie Training Site" (PDF).
- ^ Martens, Lennart (March 2016). "Public proteomics data: how the field has evolved from sceptical inquiry to the promise of in silico proteomics". EuPA Open Proteomics. 11: 42–44. doi:10.1016/j.euprot.2016.02.005. PMC 5988554. PMID 29900110.
- ^ Wilhelm, M; Schlegl, J; Hahne, H; Moghaddas Gholami, A; Lieberenz, M; Savitski, MM; Ziegler, E; Butzmann, L; Gessulat, S; Marx, H; Mathieson, T; Lemeer, S; Schnatbaum, K; Reimer, U; Wenschuh, H; Mollenhauer, M; Slotta-Huspenina, J; Boese, JH; Bantscheff, M; Gerstmair, A; Faerber, F; Kuster, B (29 May 2014). "Mass-spectrometry-based draft of the human proteome". Nature. 509 (7502): 582–7. Bibcode:2014Natur.509..582W. doi:10.1038/nature13319. PMID 24870543. S2CID 4467721.
- ^ Kim, MS; Pinto, SM; Getnet, D; Nirujogi, RS; Manda, SS; Chaerkady, R; Madugundu, AK; Kelkar, DS; Isserlin, R; Jain, S; Thomas, JK; Muthusamy, B; Leal-Rojas, P; Kumar, P; Sahasrabuddhe, NA; Balakrishnan, L; Advani, J; George, B; Renuse, S; Selvan, LD; Patil, AH; Nanjappa, V; Radhakrishnan, A; Prasad, S; Subbannayya, T; Raju, R; Kumar, M; Sreenivasamurthy, SK; Marimuthu, A; Sathe, GJ; Chavan, S; Datta, KK; Subbannayya, Y; Sahu, A; Yelamanchi, SD; Jayaram, S; Rajagopalan, P; Sharma, J; Murthy, KR; Syed, N; Goel, R; Khan, AA; Ahmad, S; Dey, G; Mudgal, K; Chatterjee, A; Huang, TC; Zhong, J; Wu, X; Shaw, PG; Freed, D; Zahari, MS; Mukherjee, KK; Shankar, S; Mahadevan, A; Lam, H; Mitchell, CJ; Shankar, SK; Satishchandra, P; Schroeder, JT; Sirdeshmukh, R; Maitra, A; Leach, SD; Drake, CG; Halushka, MK; Prasad, TS; Hruban, RH; Kerr, CL; Bader, GD; Iacobuzio-Donahue, CA; Gowda, H; Pandey, A (29 May 2014). "A draft map of the human proteome". Nature. 509 (7502): 575–81. Bibcode:2014Natur.509..575K. doi:10.1038/nature13302. PMC 4403737. PMID 24870542.
- ^ Martens, L; Chambers, M; Sturm, M; Kessner, D; Levander, F; Shofstahl, J; Tang, WH; Römpp, A; Neumann, S; Pizarro, AD; Montecchi-Palazzi, L; Tasman, N; Coleman, M; Reisinger, F; Souda, P; Hermjakob, H; Binz, PA; Deutsch, EW (January 2011). "mzML--a community standard for mass spectrometry data". Molecular & Cellular Proteomics. 10 (1): R110.000133. doi:10.1074/mcp.R110.000133. PMC 3013463. PMID 20716697.
- ^ Jones, AR; Eisenacher, M; Mayer, G; Kohlbacher, O; Siepen, J; Hubbard, SJ; Selley, JN; Searle, BC; Shofstahl, J; Seymour, SL; Julian, R; Binz, PA; Deutsch, EW; Hermjakob, H; Reisinger, F; Griss, J; Vizcaíno, JA; Chambers, M; Pizarro, A; Creasy, D (July 2012). "The mzIdentML data standard for mass spectrometry-based proteomics results". Molecular & Cellular Proteomics. 11 (7): M111.014381. doi:10.1074/mcp.M111.014381. PMC 3394945. PMID 22375074.
- ^ Deutsch, EW; Albar, JP; Binz, PA; Eisenacher, M; Jones, AR; Mayer, G; Omenn, GS; Orchard, S; Vizcaíno, JA; Hermjakob, H (May 2015). "Development of data representation standards by the human proteome organization proteomics standards initiative". Journal of the American Medical Informatics Association. 22 (3): 495–506. doi:10.1093/jamia/ocv001. PMC 4457114. PMID 25726569.
- ^ Griss, J; Jones, AR; Sachsenberg, T; Walzer, M; Gatto, L; Hartler, J; Thallinger, GG; Salek, RM; Steinbeck, C; Neuhauser, N; Cox, J; Neumann, S; Fan, J; Reisinger, F; Xu, QW; Del Toro, N; Pérez-Riverol, Y; Ghali, F; Bandeira, N; Xenarios, I; Kohlbacher, O; Vizcaíno, JA; Hermjakob, H (October 2014). "The mzTab data exchange format: communicating mass-spectrometry-based proteomics and metabolomics experimental results to a wider audience". Molecular & Cellular Proteomics. 13 (10): 2765–75. doi:10.1074/mcp.o113.036681. PMC 4189001. PMID 24980485.
- ^ Barsnes, H; Vizcaíno, JA; Eidhammer, I; Martens, L (2009). "PRIDE Converter: making proteomics data-sharing easy". Nat Biotechnol. 27 (7): 598–9. doi:10.1038/nbt0709-598. PMID 19587657. S2CID 205269351.
- ^ Côté, RG; Griss, J; Dianes, JA; Wang, R; Wright, JC; van den Toorn, HW; van Breukelen, B; Heck, AJ; Hulstaert, N; Martens, L; Reisinger, F; Csordas, A; Ovelleiro, D; Perez-Rivevol, Y; Barsnes, H; Hermjakob, H; Vizcaíno, JA (December 2012). "The PRoteomics IDEntification (PRIDE) Converter 2 framework: an improved suite of tools to facilitate data submission to the PRIDE database and the ProteomeXchange consortium". Molecular & Cellular Proteomics. 11 (12): 1682–9. doi:10.1074/mcp.o112.021543. PMC 3518121. PMID 22949509.
- ^ Wang, R; Fabregat, A; Ríos, D; Ovelleiro, D; Foster, JM; Côté, RG; Griss, J; Csordas, A; Perez-Riverol, Y; Reisinger, F; Hermjakob, H; Martens, L; Vizcaíno, JA (Feb 2012). "PRIDE Inspector: a tool to visualize and validate MS proteomics data". Nature Biotechnology. 30 (2): 135–7. doi:10.1038/nbt.2112. PMC 3277942. PMID 22318026.
- ^ Vaudel, M; Burkhart, JM; Zahedi, RP; Oveland, E; Berven, FS; Sickmann, A; Martens, L; Barsnes, H (January 2015). "PeptideShaker enables reanalysis of MS-derived proteomics data sets". Nature Biotechnology. 33 (1): 22–4. doi:10.1038/nbt.3109. PMID 25574629. S2CID 27922651.
- ^ Reisinger, F; Del-Toro, N; Ternent, T; Hermjakob, H; Vizcaíno, JA (22 April 2015). "Introducing the PRIDE Archive RESTful web services". Nucleic Acids Research. 43 (W1): W599–604. doi:10.1093/nar/gkv382. PMC 4489246. PMID 25904633.
- ^ Vizcaíno, JA; Côté, R; Reisinger, F; Mueller, M; Foster, JM; Rameseder, J; Hermjakob, H; Martens, L (2009). "A guide to the Proteomics". Identifications Database Proteomics Data Repository. 9 (18): 4276–83. doi:10.1002/pmic.200900402. PMC 2970915. PMID 19662629.
- ^ Vizcaíno, JA; Deutsch, EW; Wang, R; Csordas, A; Reisinger, F; Ríos, D; Dianes, JA; Sun, Z; Farrah, T; Bandeira, N; Binz, PA; Xenarios, I; Eisenacher, M; Mayer, G; Gatto, L; Campos, A; Chalkley, RJ; Kraus, HJ; Albar, JP; Martinez-Bartolomé, S; Apweiler, R; Omenn, GS; Martens, L; Jones, AR; Hermjakob, H (March 2014). "ProteomeXchange provides globally coordinated proteomics data submission and dissemination". Nature Biotechnology. 32 (3): 223–6. doi:10.1038/nbt.2839. PMC 3986813. PMID 24727771.
- ^ Hermjakob, H; Apweiler, R (February 2006). "The Proteomics Identifications Database (PRIDE) and the ProteomExchange Consortium: making proteomics data accessible". Expert Review of Proteomics. 3 (1): 1–3. doi:10.1586/14789450.3.1.1. PMID 16445344.
- ^ "European Commission : CORDIS : Projects and Results : International Data Exchange and Data Representation Standards for Proteomics". cordis.europa.eu. Retrieved 2017-09-22.
- ^ Csordas, A; Wang, R; Ríos, D; Reisinger, F; Foster, JM; Slotta, DJ; Vizcaíno, JA; Hermjakob, H (May 2013). "From Peptidome to PRIDE: public proteomics data migration at a large scale". Proteomics. 13 (10–11): 1692–5. doi:10.1002/pmic.201200514. PMC 3717177. PMID 23533138.
- ^ Martens, L (May 2013). "Resilience in the proteomics data ecosystem: how the field cares for its data". Proteomics. 13 (10–11): 1548–50. doi:10.1002/pmic.201300118. hdl:1854/LU-4166053. PMID 23596016. S2CID 8041195.
- ^ "Peptidome - NCBI Peptide Data Resource". www.ncbi.nlm.nih.gov. Archived from the original on 2009-07-07.