Advanced Usage
This guide covers advanced perch features including 3D data analysis, FITS file integration, and specialized filtering techniques.
Working with 3D Data
The workflow for 3D data cubes is the same as 2D, but you can compute higher-dimensional homology groups:
import numpy as np
import matplotlib.pyplot as plt
from perch.ph import PH
# Load 3D data cube (nz, ny, nx)
data_cube = np.random.randn(50, 100, 100)
# Compute H₀, H₁, and H₂
ph = PH.compute_hom(data_cube, max_Hi=2, verbose=True)
# Filter H₀ structures (peaks)
peaks = ph.filter(min_life=5.0, dimension=0)
peaks.compute_segment_hierarchy(data_cube, verbose=True)
# Filter H₂ structures (voids)
voids = ph.filter(min_life=5.0, dimension=2)
voids.compute_segment_hierarchy(data_cube, verbose=True)
# Visualize a slice
slice_idx = data_cube.shape[0] // 2
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
axes[0].imshow(data_cube[slice_idx], cmap='viridis')
axes[0].set_title('Data Slice')
axes[1].imshow(peaks.struc_map[slice_idx], cmap='tab10')
axes[1].set_title('Peaks (H₀)')
axes[2].imshow(voids.struc_map[slice_idx], cmap='tab10_r')
axes[2].set_title('Voids (H₂)')
plt.show()
Working with FITS Files
perch integrates with astropy for astronomical FITS files and World Coordinate Systems (WCS):
from astropy.io import fits
from astropy.wcs import WCS
from perch.ph import PH
# Load FITS file with WCS
hdulist = fits.open('observation.fits')
data = hdulist[0].data
wcs = WCS(hdulist[0].header)
# Compute with WCS (preserved throughout pipeline)
ph = PH.compute_hom(data, max_Hi=1, wcs=wcs, verbose=True)
# Filter and segment
peaks = ph.filter(min_life=1.0, dimension=0)
peaks.compute_segment_hierarchy(data, verbose=True)
# Access WCS coordinates
print(peaks.centroid_coord) # Centroids in WCS coordinates (e.g., RA/Dec)
The WCS information is automatically preserved through filtering and segmentation, allowing you to work in physical coordinates rather than pixel coordinates.
Advanced Filtering Techniques
Noise-Normalized Filtering
When you have an estimate of the local noise level (e.g., from observational data or error propagation), you can filter structures based on their significance relative to the local noise environment. This is particularly useful for data with spatially varying noise levels.
# Assume you have a noise map from your observations
noise_map = np.random.rand(*data.shape) * 0.5 # Example
# Filter by persistence/noise ratio at death pixel
gens = ph.generators
death_pts = gens[:, 6:9].astype(int) # Death pixel coordinates
death_noise = noise_map[death_pts[:, 0], death_pts[:, 1]] # For 2D
pers_norm = np.abs(gens[:, 1] - gens[:, 2]) / death_noise
# Select structures with normalized persistence > 10
peaks = ph.filter(mask=np.where(pers_norm > 10))
This approach identifies structures that stand out significantly above the local noise, rather than using a global threshold that might miss faint but real structures in low-noise regions or include noise in high-noise regions.
Combining Multiple Filters
Most real workflows combine several filtering criteria, since no single cut
captures everything you want. A typical filter picks a homology dimension
(which kind of structure to keep), applies a persistence cut (min_life) to
suppress noise, optionally adds bounds on the data values themselves
(min_birth / min_death) to restrict to physically meaningful intensity
ranges, and may add a normalized-persistence cut (min_life_norm_birth /
min_life_norm_death) when contrast varies across the field. filter
applies the cuts conjunctively, so any structure that fails any one of them is
dropped.
# Select structures meeting multiple criteria
peaks = ph.filter(
dimension=0,
min_life=5.0, # Minimum persistence
min_birth=10.0, # Minimum birth value
min_life_norm_birth=0.1 # Minimum normalized persistence
)
Parameter reference:
dimension: homology group to keep (0for peaks/H₀,1for loops/H₁,2for voids/H₂).min_life/max_life: persistence (lifetime) bounds. This is the most common knob — everything belowmin_lifeis treated as noise.min_birth/max_birth: bounds on the birth value. For H₀ this is the brightest pixel inside the structure; useful for excluding faint detections or capping above a saturation level.min_death/max_death: bounds on the death value (the threshold at which the structure merges into a larger one).min_life_norm_birth/min_life_norm_death: persistence normalized by birth (resp. death). Useful when absolute persistence varies across the field but the relative contrast of a feature is the more meaningful significance measure.mask: when supplied, all other cuts are ignored and only the given boolean / index mask is applied to the generators — handy for plugging in a custom selection (e.g. the noise-normalized persistence above).
Alexander Duality for Large Datasets
For very large 3D datasets (billions of voxels), computing H₂ directly can be slow. Alexander duality tells us that the H₂ classes of a superlevel-set filtration are in one-to-one correspondence with the H₀ classes of the same filtration on the negated data — intuitively, voids in the original become peaks once the sign is flipped. H₀ is dramatically cheaper to compute than H₂, so for void-finding at scale we recommend the inverted-data route:
# For large datasets: compute voids as H₀ of inverted data
# (equivalent to H₂ but much faster)
ph_voids = PH.compute_hom(-data_cube, verbose=True)
voids = ph_voids.filter(min_life=5.0, dimension=0)
voids.compute_segment_hierarchy(-data_cube, verbose=True)
This computes H₀ on the inverted data, which is mathematically equivalent to H₂ on the original data but with significantly better performance for massive datasets.
Hierarchical Analysis
Analyzing Parent-Child Relationships
After segmentation, you can explore the hierarchical relationships between structures:
# Get all structures and build full hierarchy
all_peaks = ph.strucs
all_peaks.compute_segment_hierarchy(data, verbose=True)
# Access hierarchy properties
print("Parent IDs:", all_peaks.parent)
print("Hierarchy levels:", all_peaks.level)
print("Trunk (most persistent):", all_peaks.trunk)
print("Leaves (no children):", all_peaks.leaves)
# Analyze individual structure hierarchy
sid = all_peaks.id[0]
structure = all_peaks.structures[sid]
print(f"Structure {sid}:")
print(f" Parent: {structure.parent}")
print(f" Children: {structure.children}")
print(f" Descendants: {structure.descendants}")
print(f" Level: {structure.level}")
Tips for Large Datasets
Memory management: Use
clear_indices=True(default) incompute_segment_hierarchy()Alexander duality: For 3D voids in billion-voxel datasets, use H₀ of inverted data instead of H₂
Filtering before segmentation: Apply aggressive filtering before segmentation to reduce computation
Export intermediate results: Save generators after computation to avoid recomputing during exploration
# Efficient workflow for large datasets
ph = PH.compute_hom(large_data, max_Hi=0, verbose=True)
ph.export_generators('large_ph.txt', odir='./output/')
# Later: load and filter without recomputing
ph_loaded = PH.load_from('large_ph.txt', odir='./output/',
data=large_data, wcs=wcs)
peaks = ph_loaded.filter(min_life=50.0, dimension=0)
peaks.compute_segment_hierarchy(large_data, verbose=True, clear_indices=True)
Further Resources
See Quickstart Guide for basic 2D workflow
Check Conceptual Introduction for theoretical background
Browse API for complete documentation
Explore
examples/run_perch_3d.ipynbfor detailed 3D examples