Advanced Usage ============== This guide covers advanced perch features including 3D data analysis, FITS file integration, and specialized filtering techniques. Working with 3D Data -------------------- The workflow for 3D data cubes is the same as 2D, but you can compute higher-dimensional homology groups: .. code-block:: python import numpy as np import matplotlib.pyplot as plt from perch.ph import PH # Load 3D data cube (nz, ny, nx) data_cube = np.random.randn(50, 100, 100) # Compute H₀, H₁, and H₂ ph = PH.compute_hom(data_cube, max_Hi=2, verbose=True) # Filter H₀ structures (peaks) peaks = ph.filter(min_life=5.0, dimension=0) peaks.compute_segment_hierarchy(data_cube, verbose=True) # Filter H₂ structures (voids) voids = ph.filter(min_life=5.0, dimension=2) voids.compute_segment_hierarchy(data_cube, verbose=True) # Visualize a slice slice_idx = data_cube.shape[0] // 2 fig, axes = plt.subplots(1, 3, figsize=(15, 5)) axes[0].imshow(data_cube[slice_idx], cmap='viridis') axes[0].set_title('Data Slice') axes[1].imshow(peaks.struc_map[slice_idx], cmap='tab10') axes[1].set_title('Peaks (H₀)') axes[2].imshow(voids.struc_map[slice_idx], cmap='tab10_r') axes[2].set_title('Voids (H₂)') plt.show() Working with FITS Files ------------------------ perch integrates with astropy for astronomical FITS files and World Coordinate Systems (WCS): .. code-block:: python from astropy.io import fits from astropy.wcs import WCS from perch.ph import PH # Load FITS file with WCS hdulist = fits.open('observation.fits') data = hdulist[0].data wcs = WCS(hdulist[0].header) # Compute with WCS (preserved throughout pipeline) ph = PH.compute_hom(data, max_Hi=1, wcs=wcs, verbose=True) # Filter and segment peaks = ph.filter(min_life=1.0, dimension=0) peaks.compute_segment_hierarchy(data, verbose=True) # Access WCS coordinates print(peaks.centroid_coord) # Centroids in WCS coordinates (e.g., RA/Dec) The WCS information is automatically preserved through filtering and segmentation, allowing you to work in physical coordinates rather than pixel coordinates. Advanced Filtering Techniques ------------------------------ Noise-Normalized Filtering ~~~~~~~~~~~~~~~~~~~~~~~~~~~ When you have an estimate of the local noise level (e.g., from observational data or error propagation), you can filter structures based on their significance relative to the local noise environment. This is particularly useful for data with spatially varying noise levels. .. code-block:: python # Assume you have a noise map from your observations noise_map = np.random.rand(*data.shape) * 0.5 # Example # Filter by persistence/noise ratio at death pixel gens = ph.generators death_pts = gens[:, 6:9].astype(int) # Death pixel coordinates death_noise = noise_map[death_pts[:, 0], death_pts[:, 1]] # For 2D pers_norm = np.abs(gens[:, 1] - gens[:, 2]) / death_noise # Select structures with normalized persistence > 10 peaks = ph.filter(mask=np.where(pers_norm > 10)) This approach identifies structures that stand out significantly above the local noise, rather than using a global threshold that might miss faint but real structures in low-noise regions or include noise in high-noise regions. Combining Multiple Filters ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Most real workflows combine several filtering criteria, since no single cut captures everything you want. A typical filter picks a homology ``dimension`` (which kind of structure to keep), applies a persistence cut (``min_life``) to suppress noise, optionally adds bounds on the data values themselves (``min_birth`` / ``min_death``) to restrict to physically meaningful intensity ranges, and may add a normalized-persistence cut (``min_life_norm_birth`` / ``min_life_norm_death``) when contrast varies across the field. ``filter`` applies the cuts conjunctively, so any structure that fails any one of them is dropped. .. code-block:: python # Select structures meeting multiple criteria peaks = ph.filter( dimension=0, min_life=5.0, # Minimum persistence min_birth=10.0, # Minimum birth value min_life_norm_birth=0.1 # Minimum normalized persistence ) Parameter reference: - ``dimension``: homology group to keep (``0`` for peaks/H₀, ``1`` for loops/H₁, ``2`` for voids/H₂). - ``min_life`` / ``max_life``: persistence (lifetime) bounds. This is the most common knob — everything below ``min_life`` is treated as noise. - ``min_birth`` / ``max_birth``: bounds on the birth value. For H₀ this is the brightest pixel inside the structure; useful for excluding faint detections or capping above a saturation level. - ``min_death`` / ``max_death``: bounds on the death value (the threshold at which the structure merges into a larger one). - ``min_life_norm_birth`` / ``min_life_norm_death``: persistence normalized by birth (resp. death). Useful when absolute persistence varies across the field but the relative contrast of a feature is the more meaningful significance measure. - ``mask``: when supplied, all other cuts are ignored and only the given boolean / index mask is applied to the generators — handy for plugging in a custom selection (e.g. the noise-normalized persistence above). Alexander Duality for Large Datasets ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For very large 3D datasets (billions of voxels), computing H₂ directly can be slow. `Alexander duality `_ tells us that the H₂ classes of a superlevel-set filtration are in one-to-one correspondence with the H₀ classes of the same filtration on the *negated* data — intuitively, voids in the original become peaks once the sign is flipped. H₀ is dramatically cheaper to compute than H₂, so for void-finding at scale we recommend the inverted-data route: .. code-block:: python # For large datasets: compute voids as H₀ of inverted data # (equivalent to H₂ but much faster) ph_voids = PH.compute_hom(-data_cube, verbose=True) voids = ph_voids.filter(min_life=5.0, dimension=0) voids.compute_segment_hierarchy(-data_cube, verbose=True) This computes H₀ on the inverted data, which is mathematically equivalent to H₂ on the original data but with significantly better performance for massive datasets. Hierarchical Analysis ---------------------- Analyzing Parent-Child Relationships ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ After segmentation, you can explore the hierarchical relationships between structures: .. code-block:: python # Get all structures and build full hierarchy all_peaks = ph.strucs all_peaks.compute_segment_hierarchy(data, verbose=True) # Access hierarchy properties print("Parent IDs:", all_peaks.parent) print("Hierarchy levels:", all_peaks.level) print("Trunk (most persistent):", all_peaks.trunk) print("Leaves (no children):", all_peaks.leaves) # Analyze individual structure hierarchy sid = all_peaks.id[0] structure = all_peaks.structures[sid] print(f"Structure {sid}:") print(f" Parent: {structure.parent}") print(f" Children: {structure.children}") print(f" Descendants: {structure.descendants}") print(f" Level: {structure.level}") Tips for Large Datasets ------------------------ 1. **Memory management**: Use ``clear_indices=True`` (default) in ``compute_segment_hierarchy()`` 2. **Alexander duality**: For 3D voids in billion-voxel datasets, use H₀ of inverted data instead of H₂ 3. **Filtering before segmentation**: Apply aggressive filtering before segmentation to reduce computation 4. **Export intermediate results**: Save generators after computation to avoid recomputing during exploration .. code-block:: python # Efficient workflow for large datasets ph = PH.compute_hom(large_data, max_Hi=0, verbose=True) ph.export_generators('large_ph.txt', odir='./output/') # Later: load and filter without recomputing ph_loaded = PH.load_from('large_ph.txt', odir='./output/', data=large_data, wcs=wcs) peaks = ph_loaded.filter(min_life=50.0, dimension=0) peaks.compute_segment_hierarchy(large_data, verbose=True, clear_indices=True) Further Resources ----------------- - See :doc:`quickstart` for basic 2D workflow - Check :doc:`concepts` for theoretical background - Browse :doc:`api` for complete documentation - Explore ``examples/run_perch_3d.ipynb`` for detailed 3D examples