I want to get people's intuition if this dataset needs batch correction. It's single nucleus RNA sequencing of the human hippocampus across many donors. Some of the donors' cells are confined to corners of each cell type cluster on the UMAP. After batch correction with Harmony, the clusters look better integrated by donor. Am I erasing real biological variation here? Should I be batch correcting this data by donor? Is there a more rigorous way to test if a dataset needs batch correction than the UMAP eye test? Let me know.
My goal is to find and annotate rare cell populations shared across donors.
before batch correction
after batch correction