Reference¶
chemoecology_tools¶
Chemoecology tools for chemical ecology analysis.
- class chemoecology_tools.GCMSExperiment(abundance_df, metadata_df, id_col='ID', experiment_name=None, chemical_metadata=None)[source]¶
Gas Chromatography-Mass Spectrometry (GCMS) experimental data container.
Manages GCMS abundance data, experimental metadata, and chemical properties.
- Attributes:
abundance_df: DataFrame containing GCMS chemical abundance measurements metadata_df: DataFrame containing sample and experimental metadata id_col: Column name used to join abundance and metadata experiment_name: Optional identifier for the experiment chemical_metadata: Dictionary of chemical properties from config
- Parameters:
abundance_df (pd.DataFrame)
metadata_df (pd.DataFrame)
id_col (str)
experiment_name (str | None)
chemical_metadata (dict[str, dict[str, Any]] | None)
- calculate_relative_abundance()[source]¶
Calculate relative abundance of chemical compounds.
- Returns:
GCMSExperiment with relative abundance values
- Return type:
- filter(metadata_mask=None, chemical_mask=None)[source]¶
Filter experiment using boolean masks for metadata and/or chemicals.
- Args:
metadata_mask: Boolean Series for filtering metadata rows chemical_mask: Boolean Series for filtering chemical columns
- Returns:
New GCMSExperiment with filtered data
- Example:
# Filter based on both metadata and chemicals meta_mask = exp.metadata_df[“Species”].isin([“ant”, “bee”]) chem_mask = pd.Series([
exp.get_chemical_property(c, “class”) == “terpene” for c in exp.chemical_cols
], index=exp.chemical_cols)
- filtered_exp = exp.filter(
metadata_mask=meta_mask, chemical_mask=chem_mask
)
- Parameters:
metadata_mask (pd.Series[bool] | None)
chemical_mask (pd.Series[bool] | None)
- Return type:
- filter_samples(criteria)[source]¶
Filter samples based on metadata criteria.
- Args:
criteria: Filtering criteria {column: values_to_exclude}
- Returns:
New GCMSExperiment with filtered data
- Parameters:
criteria (dict[str, list[str]])
- Return type:
- filter_trace_compounds(threshold=0.005)[source]¶
Filter out trace chemical amounts below threshold.
- Args:
threshold: Minimum abundance value to keep (lower values set to 0)
- Returns:
GCMSExperiment with filtered abundance values
- Raises:
ValueError: If threshold is not between 0 and 1
- Parameters:
threshold (float)
- Return type:
- classmethod from_files(abundance_path, metadata_path, user_chemical_metadata=None, fetch_pubchem=True, id_col='ID', filter_dict=None, experiment_name=None)[source]¶
Create experiment from data files.
- Args:
abundance_path: Path to abundance data file metadata_path: Path to metadata file user_chemical_metadata: Optional path to chemical properties YAML fetch_pubchem: Whether to fetch PubChem data for chemicals id_col: Column name to join on filter_dict: Optional filtering criteria {column: values_to_exclude} experiment_name: Optional experiment identifier
- Returns:
New GCMSExperiment instance
- Parameters:
abundance_path (str | Path)
metadata_path (str | Path)
user_chemical_metadata (str | Path | None)
fetch_pubchem (bool)
id_col (str)
filter_dict (dict[str, list[str]] | None)
experiment_name (str | None)
- Return type:
- get_abundance_matrix()[source]¶
Get chemical abundance matrix.
- Returns:
DataFrame containing only chemical abundance measurements
- Return type:
DataFrame
- get_chemical_property(chemical, property_name, default=None)[source]¶
Get property value for a chemical.
- Args:
chemical: Name of the chemical property_name: Name of the property to retrieve default: Value to return if property not found
- Returns:
Property value or default if not found
- Parameters:
chemical (str)
property_name (str)
default (Any | None)
- Return type:
Any
- get_chemicals_by_property(property_name, value)[source]¶
Get chemicals that have a specific property value.
- Args:
property_name: Name of the property to match value: Value to match
- Returns:
List of chemical names with matching property
- Parameters:
property_name (str)
value (Any)
- Return type:
list[str]
- chemoecology_tools.perform_nmds(experiment, n_components=2, random_state=42)[source]¶
Perform NMDS on chemical data.
- Args:
experiment: GCMSExperiment instance containing the data n_components: Number of dimensions to reduce to random_state: Random seed for reproducibility
- Returns:
DataFrame containing NMDS coordinates (NMDS1, NMDS2)
- Parameters:
experiment (GCMSExperiment)
n_components (int)
random_state (int)
- Return type:
DataFrame
- chemoecology_tools.plot_nmds(experiment, nmds_coords, group_col=None, title='NMDS Plot', width=10, aspect_ratio=0.618)[source]¶
Create a beautifully styled NMDS plot for GCMS experiment data.
- Args:
experiment: GCMSExperiment instance containing the data. nmds_coords: DataFrame containing NMDS coordinates (NMDS1, NMDS2). group_col: Optional metadata column name to group/color points by.
Defaults to None.
title: Plot title. Defaults to “NMDS Plot”. width: Figure width in inches. Defaults to
FIGURE_SETTINGS[“default_width”].
- aspect_ratio: Height/width ratio for the figure.
Defaults to FIGURE_SETTINGS[“golden_ratio”].
- Returns:
Figure: Matplotlib Figure object containing the styled plot.
- Parameters:
experiment (GCMSExperiment)
nmds_coords (DataFrame)
group_col (str | None)
title (str)
width (float)
aspect_ratio (float)
- Return type:
Figure