Strategies for Selecting Key Interacting Proteins in IP-MS Studies
Through Immunoprecipitation-Mass Spectrometry (IP-MS), many (potential) interacting proteins of the bait protein can be identified. For subsequent studies, it is often necessary to select several key proteins from the identified interactors for further validation of interaction and in-depth exploration of functional mechanisms. If the number of interacting proteins identified by IP-MS is small, PubMed, NCBI, UniProt, and other databases can be searched separately for each interactor to retrieve relevant literature and functional descriptions. However, if a large number of interacting proteins are identified by IP-MS, or if the information obtained from literature and database searches is either too abundant or too scarce, making the selection process difficult, the following selection strategies can be referred to for focusing on the selection of proteins for subsequent research.
Selection of Key Interacting Proteins Based on Quantitative IP-MS Data
When selecting key interacting proteins based on quantitative data obtained from IP-MS, researchers commonly rely on two main metrics: Fold Change and Intensity. Fold Change signifies the magnitude of difference in protein levels between the IP experimental group and the control group, while Intensity reflects the signal strength of proteins detected in the IP sample, serving as a proxy for protein abundance.
Typically, proteins exhibiting larger Fold Change values and higher Intensity levels in the experimental group are given priority for further investigation. These proteins are presumed to play significant roles in the biological processes under study and are thus selected as potential targets for subsequent research endeavors.
It is noteworthy that the Fold Change and Intensity values of interacting proteins in IP-MS experiments generally fall within a certain range and seldom surpass those of the bait protein. If a protein’s Fold Change and Intensity values exceed those of the bait protein, caution is warranted as it may indicate non-specific binding or contamination rather than genuine interaction. Therefore, careful consideration and validation are essential to discern true interacting proteins from potential artifacts or contaminants in IP-MS data analysis.
Selecting key interacting proteins based on protein-protein interaction networks
Building and visualizing interaction networks using the STRING protein-protein interaction database and Cytoscape software is a common practice in proteomics research. The STRING database is the largest and most comprehensive repository of protein-protein interactions, while Cytoscape is widely regarded as the most versatile tool for network visualization and analysis. Utilizing IP-MS data, combined with publicly available data from protein-protein interaction databases, a comprehensive protein interaction network can be constructed and visualized through the following steps:
Installation of Cytoscape Software and StringApp Plugin:
Cytoscape is an open-source software tool renowned for its comprehensive functionality in integrating, analyzing, and visualizing networked data. The latest version, 3.9.1, can be freely downloaded from the official website (https://cytoscape.org). The software requires Java 11 support, and specific requirements can be found on the official website. After installing Cytoscape, navigate to App/AppManager/InstallApp to search for and install the StringApp extension plugin, which facilitates the retrieval and integration of interaction network data.
Retrieval of Interaction Proteins Based on the STRING Database:
STRING is the most extensive database of protein-protein interactions, and interaction data can be directly accessed through its official website (https://cn.string-db.org) or via the StringApp plugin within Cytoscape. After installing the StringApp extension plugin, input the gene name or UniProt accession number of the target protein into the Cytoscape search bar. Specify species information, network type, confidence score cutoff, and the maximum number of external interactors to extract and visualize the protein interaction network.
In the network type, “physical subnetwork” includes interactions with evidence of substantial binding or formation of protein complexes, while the “full STRING network” encompasses interactions predicted from various sources such as neighboring genes, text mining, co-expression, and co-localization. It is recommended to select “physical subnetwork” here.
The confidence score cutoff represents the reliability score calculated by STRING for protein interactions. STRING assigns a high-confidence score of 0.7 for interactions, while scores of 0.4 and 0.15 represent medium and low confidence, respectively. The default selection is 0.4 for interaction reliability cutoff.
The maximum external interactors indicate the maximum number of interactors to be extracted. Cytoscape can retrieve network data for up to 100 interactors. If the database contains fewer interactors meeting the criteria, only those will be displayed. If the number of interactors meeting the criteria exceeds the set limit, only the specified number will be shown. It is recommended to select up to 100 interactors.
Construction of Protein Interaction Networks Based on IP-MS Quantitative Data:
Based on the results of IP-MS experiments, proteins interacting with the target protein can be identified. These proteins can be formatted into two columns and imported into Cytoscape using the “import network from file” function, with the “Bait” column designated as the source node and the “Interactor” column as the target node. Proteins can be input in gene name or UniProt Accession format to generate a protein interaction network based on IP-MS quantitative data.
Construction of Protein Interaction Networks Based on the STRING Database:
Integrate the list of interactors obtained in step 2 and the list of IP-MS interaction proteins from step 3. Input both the bait protein and all interaction proteins into the Cytoscape search bar, with the maximum external interactors set to 0. This will generate a network result containing both the bait protein and all interaction proteins based on information from the STRING database. It is essential to ensure consistency between the “display name” in the interactors list from step 2 and the proteins list from step 3.
Integration and Visualization of Protein Interaction Networks:
Utilize Cytoscape’s Merge function to integrate the protein interaction networks based on the STRING database and IP-MS quantitative data. Before integration, ensure that the “share name” and “name” of nodes in the networks to be integrated are converted to the same type of naming code. Use the f(x) function in the Node table to index “share name” and “name” to “display name” or “canonical name” to match the gene name or UniProt Accession of proteins in the IP-MS interaction network. Subsequently, use the “import data from file” function to import IP-MS quantitative data into Cytoscape. Adjust parameters in the style control panel to represent the source, IP-MS fold change, and P-value of interaction proteins using node status, color, and size, respectively. This will yield a professional and visually appealing protein interaction network diagram.
The interaction network serves not only to display IP-MS results but also to select key interacting proteins for further research focus.
In protein interaction networks, the most central nodes often hold more significant functional roles. The Cytohubba plugin in Cytoscape offers various algorithms to score and rank node importance, aiding in the selection of crucial nodes within the interaction network.
After installing the Cytohubba plugin in Cytoscape, the Cytohubba control panel will appear on the left side. Clicking “calculate” will compute the importance scores of nodes within the interaction network. Cytohubba provides 12 different algorithms for scoring importance. After selecting an algorithm, submitting will display a protein interaction subnetwork consisting of top-scored key node proteins according to that algorithm. These proteins can be chosen as key proteins for subsequent research. If all proteins in this subnetwork are known interactors from the STRING database, the number of nodes in the subnetwork can be increased appropriately to include newly discovered interactors from IP-MS.
The developers of Cytohubba tested their algorithms using the protein interaction network in yeast, with MCC and DMNC being the best-performing algorithms. However, due to potential heterogeneity in interaction networks across different biological systems, it is often necessary to integrate and compare results from multiple algorithms before making a selection.
Selecting key interacting proteins based on functional annotation and enrichment analysis
Functional annotation and enrichment analysis of differential proteins using databases such as GO and KEGG are among the most common bioinformatics analyses in proteomics research. This methodology can also be applied in interactome studies. Tools like Metascape, Panther, relevant functional annotation and enrichment R packages, or the annotation enrichment tools integrated within the STRING database can be used to annotate and analyze the enriched pathways and protein functional categories of the interacting proteins identified through IP-MS.
Through functional annotation and enrichment analysis, significantly enriched signaling pathways and protein functional categories within the interactome can be identified. Subsequently, proteins with the highest enrichment levels or those belonging to functionally interesting categories can be selected for further validation. Alternatively, all annotated categories can be considered, with the most differentially expressed proteins within each category chosen for subsequent research.
Furthermore, some proteins identified as interacting proteins and filtered through the aforementioned strategies are typically not selected for subsequent functional and mechanistic exploration. These proteins often represent highly abundant cellular proteins, such as ribosomal proteins (RPS protein family), potentially interacting due to their involvement in the translation process; cytoskeletal proteins (e.g., Actin), possibly showing non-specific interactions due to their widespread presence in cells; or keratin proteins (Keratin protein family), likely introduced as incidental contaminants due to inadvertent handling during experiments.
Summary
After obtaining a list of interacting proteins through IP-MS, the following strategies can be referenced to select key proteins for subsequent exploration and validation of functionality and interaction mechanisms:
Sequentially search for literature and functional descriptions related to each interacting protein in databases such as PubMed and UniProt.
Choose proteins with higher fold changes and detection signal intensities in the experimental group, ranking higher.
Extract key subnetworks from the interaction network using Cytohubba or MCODE plugins in Cytoscape, and select protein nodes from within.
Select proteins based on functional annotation and enrichment analysis targeting specific signaling pathways or functional categories.
Exercise caution when selecting abundant proteins that are commonly present, such as RPS, Keratin, Actin, etc.
Reference
- Kulichikhin, Konstantin Y., et al. “Development of molecular tools for diagnosis of Alzheimer’s disease that are based on detection of amyloidogenic proteins.” Prion 15.1 (2021): 56-69.