Reference

chemoecology_tools

Chemoecology tools for chemical ecology analysis.

class chemoecology_tools.GCMSExperiment(abundance_df, metadata_df, id_col='ID', experiment_name=None, chemical_metadata=None)[source]

Gas Chromatography-Mass Spectrometry (GCMS) experimental data container.

Manages GCMS abundance data, experimental metadata, and chemical properties.

Attributes:

abundance_df: DataFrame containing GCMS chemical abundance measurements metadata_df: DataFrame containing sample and experimental metadata id_col: Column name used to join abundance and metadata experiment_name: Optional identifier for the experiment chemical_metadata: Dictionary of chemical properties from config

Parameters:
  • abundance_df (pd.DataFrame)

  • metadata_df (pd.DataFrame)

  • id_col (str)

  • experiment_name (str | None)

  • chemical_metadata (dict[str, dict[str, Any]] | None)

calculate_relative_abundance()[source]

Calculate relative abundance of chemical compounds.

Returns:

GCMSExperiment with relative abundance values

Return type:

GCMSExperiment

filter(metadata_mask=None, chemical_mask=None)[source]

Filter experiment using boolean masks for metadata and/or chemicals.

Args:

metadata_mask: Boolean Series for filtering metadata rows chemical_mask: Boolean Series for filtering chemical columns

Returns:

New GCMSExperiment with filtered data

Example:

# Filter based on both metadata and chemicals meta_mask = exp.metadata_df[“Species”].isin([“ant”, “bee”]) chem_mask = pd.Series([

exp.get_chemical_property(c, “class”) == “terpene” for c in exp.chemical_cols

], index=exp.chemical_cols)

filtered_exp = exp.filter(

metadata_mask=meta_mask, chemical_mask=chem_mask

)

Parameters:
  • metadata_mask (pd.Series[bool] | None)

  • chemical_mask (pd.Series[bool] | None)

Return type:

GCMSExperiment

filter_samples(criteria)[source]

Filter samples based on metadata criteria.

Args:

criteria: Filtering criteria {column: values_to_exclude}

Returns:

New GCMSExperiment with filtered data

Parameters:

criteria (dict[str, list[str]])

Return type:

GCMSExperiment

filter_trace_compounds(threshold=0.005)[source]

Filter out trace chemical amounts below threshold.

Args:

threshold: Minimum abundance value to keep (lower values set to 0)

Returns:

GCMSExperiment with filtered abundance values

Raises:

ValueError: If threshold is not between 0 and 1

Parameters:

threshold (float)

Return type:

GCMSExperiment

classmethod from_files(abundance_path, metadata_path, user_chemical_metadata=None, fetch_pubchem=True, id_col='ID', filter_dict=None, experiment_name=None)[source]

Create experiment from data files.

Args:

abundance_path: Path to abundance data file metadata_path: Path to metadata file user_chemical_metadata: Optional path to chemical properties YAML fetch_pubchem: Whether to fetch PubChem data for chemicals id_col: Column name to join on filter_dict: Optional filtering criteria {column: values_to_exclude} experiment_name: Optional experiment identifier

Returns:

New GCMSExperiment instance

Parameters:
  • abundance_path (str | Path)

  • metadata_path (str | Path)

  • user_chemical_metadata (str | Path | None)

  • fetch_pubchem (bool)

  • id_col (str)

  • filter_dict (dict[str, list[str]] | None)

  • experiment_name (str | None)

Return type:

GCMSExperiment

get_abundance_matrix()[source]

Get chemical abundance matrix.

Returns:

DataFrame containing only chemical abundance measurements

Return type:

DataFrame

get_chemical_property(chemical, property_name, default=None)[source]

Get property value for a chemical.

Args:

chemical: Name of the chemical property_name: Name of the property to retrieve default: Value to return if property not found

Returns:

Property value or default if not found

Parameters:
  • chemical (str)

  • property_name (str)

  • default (Any | None)

Return type:

Any

get_chemicals_by_property(property_name, value)[source]

Get chemicals that have a specific property value.

Args:

property_name: Name of the property to match value: Value to match

Returns:

List of chemical names with matching property

Parameters:
  • property_name (str)

  • value (Any)

Return type:

list[str]

get_metadata(columns=None)[source]

Get metadata columns.

Args:

columns: Optional list of column names to return

Returns:

DataFrame containing requested metadata columns

Parameters:

columns (list[str] | None)

Return type:

DataFrame

merge()[source]

Merge abundance and metadata.

Returns:

DataFrame with joined abundance and metadata

Return type:

DataFrame

chemoecology_tools.perform_nmds(experiment, n_components=2, random_state=42)[source]

Perform NMDS on chemical data.

Args:

experiment: GCMSExperiment instance containing the data n_components: Number of dimensions to reduce to random_state: Random seed for reproducibility

Returns:

DataFrame containing NMDS coordinates (NMDS1, NMDS2)

Parameters:
Return type:

DataFrame

chemoecology_tools.plot_nmds(experiment, nmds_coords, group_col=None, title='NMDS Plot', width=10, aspect_ratio=0.618)[source]

Create a beautifully styled NMDS plot for GCMS experiment data.

Args:

experiment: GCMSExperiment instance containing the data. nmds_coords: DataFrame containing NMDS coordinates (NMDS1, NMDS2). group_col: Optional metadata column name to group/color points by.

Defaults to None.

title: Plot title. Defaults to “NMDS Plot”. width: Figure width in inches. Defaults to

FIGURE_SETTINGS[“default_width”].

aspect_ratio: Height/width ratio for the figure.

Defaults to FIGURE_SETTINGS[“golden_ratio”].

Returns:

Figure: Matplotlib Figure object containing the styled plot.

Parameters:
  • experiment (GCMSExperiment)

  • nmds_coords (DataFrame)

  • group_col (str | None)

  • title (str)

  • width (float)

  • aspect_ratio (float)

Return type:

Figure

chemoecology_tools.setup_plotting_style()[source]

Configure global matplotlib and seaborn plotting style settings.

Return type:

None