Reference

chemoecology_tools

Chemoecology tools for chemical ecology analysis.

class chemoecology_tools.GCMSExperiment(abundance_df, metadata_df, id_col='ID', experiment_name=None, chemical_metadata=None)

Gas Chromatography-Mass Spectrometry (GCMS) experimental data container.

Manages GCMS abundance data, experimental metadata, and chemical properties.

Parameters:
  • abundance_df (DataFrame)

  • metadata_df (DataFrame)

  • id_col (str)

  • experiment_name (str | None)

  • chemical_metadata (dict[str, dict[str, Any]] | None)

abundance_df

DataFrame containing GCMS chemical abundance measurements

metadata_df

DataFrame containing sample and experimental metadata

id_col

Column name used to join abundance and metadata

experiment_name

Optional identifier for the experiment

chemical_metadata

Dictionary of chemical properties from config

calculate_relative_abundance()

Calculate relative abundance of chemical compounds.

Returns:

GCMSExperiment with relative abundance values

Return type:

GCMSExperiment

filter_samples(criteria)

Filter samples based on metadata criteria.

Parameters:

criteria (dict[str, list[str]]) – Filtering criteria {column: values_to_exclude}

Returns:

New GCMSExperiment with filtered data

Return type:

GCMSExperiment

filter_trace_compounds(threshold=0.005)

Filter out trace chemical amounts below threshold.

Parameters:

threshold (float) – Minimum abundance value to keep (lower values set to 0)

Returns:

GCMSExperiment with filtered abundance values

Raises:

ValueError – If threshold is not between 0 and 1

Return type:

GCMSExperiment

classmethod from_files(abundance_path, metadata_path, user_chemical_metadata=None, fetch_pubchem=True, id_col='ID', filter_dict=None, experiment_name=None)

Create experiment from data files.

Parameters:
  • abundance_path (str | Path) – Path to abundance data file

  • metadata_path (str | Path) – Path to metadata file

  • user_chemical_metadata (str | Path | None) – Optional path to chemical properties YAML

  • fetch_pubchem (bool) – Whether to fetch PubChem data for chemicals

  • id_col (str) – Column name to join on

  • filter_dict (dict[str, list[str]] | None) – Optional filtering criteria {column: values_to_exclude}

  • experiment_name (str | None) – Optional experiment identifier

Returns:

New GCMSExperiment instance

Return type:

GCMSExperiment

get_abundance_matrix()

Get chemical abundance matrix.

Returns:

DataFrame containing only chemical abundance measurements

Return type:

DataFrame

get_chemical_property(chemical, property_name, default=None)

Get property value for a chemical.

Parameters:
  • chemical (str) – Name of the chemical

  • property_name (str) – Name of the property to retrieve

  • default (Any | None) – Value to return if property not found

Returns:

Property value or default if not found

Return type:

Any

get_chemicals_by_property(property_name, value)

Get chemicals that have a specific property value.

Parameters:
  • property_name (str) – Name of the property to match

  • value (Any) – Value to match

Returns:

List of chemical names with matching property

Return type:

list[str]

get_metadata(columns=None)

Get metadata columns.

Parameters:

columns (list[str] | None) – Optional list of column names to return

Returns:

DataFrame containing requested metadata columns

Return type:

DataFrame

merge()

Merge abundance and metadata.

Returns:

DataFrame with joined abundance and metadata

Return type:

DataFrame

chemoecology_tools.perform_nmds(experiment, n_components=2, random_state=42)

Perform NMDS on chemical data.

Parameters:
  • experiment (GCMSExperiment) – GCMSExperiment instance containing the data

  • n_components (int) – Number of dimensions to reduce to

  • random_state (int) – Random seed for reproducibility

Returns:

DataFrame containing NMDS coordinates (NMDS1, NMDS2)

Return type:

DataFrame

chemoecology_tools.plot_nmds(experiment, nmds_coords, group_col=None, title='NMDS Plot', width=10, aspect_ratio=0.618)

Create a beautifully styled NMDS plot for GCMS experiment data.

Parameters:
  • experiment (GCMSExperiment) – GCMSExperiment instance containing the data

  • nmds_coords (DataFrame) – DataFrame containing NMDS coordinates (NMDS1, NMDS2)

  • group_col (str | None) – Optional metadata column name to group/color points by

  • title (str) – Plot title

  • width (float) – Figure width in inches

  • aspect_ratio (float) – Height/width ratio for the figure

Returns:

matplotlib Figure object containing the styled plot

Return type:

Figure

chemoecology_tools.setup_plotting_style()

Configure global matplotlib and seaborn plotting style settings.

Return type:

None