Brain Cell Types & Clustering Analysis

This document summarizes work on analyzing unlabeled brain cell data from 3 main cell types and their subtypes. It discusses using clustering and feature selection methods like logistic regression with regularization to identify key genes that distinguish cell types. The effects of hyperparameters like the number of principal components used in T-SNE visualization and regularization parameters in models are also analyzed. Maintaining reproducibility and addressing issues like multiple testing are important considerations discussed.

Uploaded by

Begad Hosni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

387 views2 pages

Brain Cell Types & Clustering Analysis

Uploaded by

Begad Hosni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

6.

419x Module 2 Report

BegadE

May 2022

Problem 2: Larger unlabeled subset

Part 1: Visualization
1. (3 points) Provide at least one visualization which clearly shows the existence of three main brain cell types as
described by the scientist, and explain how it shows this. Your visualization should support the idea that cells from a
different group (for example, excitatory vs inhibitory) can differ greatly.
Solution:
2. (4 points) Provide at least one visualization which supports the claim that within each of the three types, there are
numerous possible sub-types for a cell. In your visualization, highlight which of the three main types these sub-types
belong to. Again, explain how your visualization supports the claim.
Solution:

Part 2: Unsupervised Feature Selection

1. (4 points) Using your clustering method(s) of choice, find a suitable clustering for the cells. Briefly explain how you
chose the number of clusters by appropriate visualizations and/or numerical findings.
Solution:
2. (6 points) We will now treat your cluster assignments as labels for supervised learning. Fit a logistic regression
model to the original data (not principal components), with your clustering as the target labels. Since the data is
high-dimensional, make sure to regularize your model using your choice of ℓ1 ,ℓ2 , or elastic net, and separate the data
into training and validation or use cross-validation to select your model. Report your choice of regularization parameter
and validation performance.
Solution:
3. (9 points) Select the features with the top 100 corresponding coefficient values (since this is a multi-class model,
you can rank the coefficients using the maximum absolute value over classes, or the sum of absolute values). Take the
evaluation training data in p2evaluation and use a subset of the genes consisting of the features you selected. Train
a logistic regression classifier on this training data, and evaluate its performance on the evaluation test data. Report
your score.
Solution:

1
Problem 3: Influence of Hyper-parameters
1. (3 points) When we created the T-SNE plot in Problem 1, we ran T-SNE on the top 50 PC’s of the data. But
we could have easily chosen a different number of PC’s to represent the data. Run T-SNE using 10, 50, 100, 250,
and 500 PC’s, and plot the resulting visualization for each. What do you observe as you increase the number of PC’s
used?
Solution:

2. (13 points) Pick three hyper-parameters below and analyze how changing the hyper-parameters affect the conclu-
sions that can be drawn from the data. Please choose at least one hyper-parameter from each of the two categories
(visualization and clustering/feature selection). At minimum, evaluate the hyper-parameters individually, but you may
also evaluate how joint changes in the hyper-parameters affect the results. You may use any of the datasets we have
given you in this project. For visualization hyper-parameters, you may find it productive to augment your analysis with
experiments on synthetic data, though we request that you use real data in at least one demonstration.
Solution:

Reference
[1] R. L. Wasserstein and N. A. Lazar, “The ASA statement on p-values: context, process, and purpose,” The American
Statistician, vol. 70, no. 2, pp. 129-133, 2016.
Ioannidis, J. P. A. (2005, August). “Why most published research findings are false”. PLoS medicine. Retrieved June
7, 2022, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1182327/
National Academies of Sciences, Engineering, and Medicine. 2019. Reproducibility and Replicability in Science.
Washington, DC: The National Academies Press. https://doi.org/10.17226/25303.

CS725 2020 Midsem
No ratings yet
CS725 2020 Midsem
3 pages
CS 229, Summer 2019 Problem Set #1 Solutions
No ratings yet
CS 229, Summer 2019 Problem Set #1 Solutions
22 pages
Week 1 Quiz
100% (1)
Week 1 Quiz
28 pages
Logistic Regression in R
No ratings yet
Logistic Regression in R
19 pages
Geometric Distribution Report
No ratings yet
Geometric Distribution Report
5 pages
Ps 1
No ratings yet
Ps 1
16 pages
Classification and Regression Trees Leo Breiman PDF Download
No ratings yet
Classification and Regression Trees Leo Breiman PDF Download
98 pages
CS 229, Summer 2019 Problem Set #3 Solutions
No ratings yet
CS 229, Summer 2019 Problem Set #3 Solutions
19 pages
Multinomial Logistic Regression Guide
No ratings yet
Multinomial Logistic Regression Guide
73 pages
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
100% (1)
Linear Regression: Major: All Engineering Majors Authors: Autar Kaw, Luke Snyder
25 pages
Machine Learning 10-701 Exam Prep
No ratings yet
Machine Learning 10-701 Exam Prep
14 pages
Moments and MGF
No ratings yet
Moments and MGF
7 pages
Gradient Descent
No ratings yet
Gradient Descent
18 pages
Gamma Extended Frechet Distribution
No ratings yet
Gamma Extended Frechet Distribution
23 pages
Ue22cs342aa2 20241114095341
No ratings yet
Ue22cs342aa2 20241114095341
23 pages
Probability Solutions by Grinstead & Snell
No ratings yet
Probability Solutions by Grinstead & Snell
45 pages
HW04 Sol
No ratings yet
HW04 Sol
14 pages
Rohatgi Expl
No ratings yet
Rohatgi Expl
192 pages
Doing Bayesian Data Analysis With JASP: Darrell A. Worthy
No ratings yet
Doing Bayesian Data Analysis With JASP: Darrell A. Worthy
76 pages
Probabilistic Methods in Engineering: Dr. Horst Hohberger
No ratings yet
Probabilistic Methods in Engineering: Dr. Horst Hohberger
355 pages
Data Analytics Lab for Managers
No ratings yet
Data Analytics Lab for Managers
12 pages
MITx - 18.6501x - FUNDAMENTALS OF STATISTICS
No ratings yet
MITx - 18.6501x - FUNDAMENTALS OF STATISTICS
10 pages
Probability Distributions in Data Science - Towards Data Science
No ratings yet
Probability Distributions in Data Science - Towards Data Science
15 pages
Hw1 Theory Solution PuHK4fmHvB
No ratings yet
Hw1 Theory Solution PuHK4fmHvB
4 pages
STAT 5002 Midterm Review Guide
No ratings yet
STAT 5002 Midterm Review Guide
8 pages
Midterm Exam Fall 2019 Solution PDF
No ratings yet
Midterm Exam Fall 2019 Solution PDF
7 pages
BADM 572 Module 4 Study Session 7 April 2019
No ratings yet
BADM 572 Module 4 Study Session 7 April 2019
44 pages
Introductory Statistics Exploring The World Through Data 1st Edition Gould Fast Access
No ratings yet
Introductory Statistics Exploring The World Through Data 1st Edition Gould Fast Access
329 pages
The Problem of Overfitting - Coursera
No ratings yet
The Problem of Overfitting - Coursera
1 page
STAT Final Sample
No ratings yet
STAT Final Sample
4 pages
Pearson Distribution
No ratings yet
Pearson Distribution
11 pages
Simple Regression Quiz
No ratings yet
Simple Regression Quiz
6 pages
ANOVA for Diet Efficiency Analysis
No ratings yet
ANOVA for Diet Efficiency Analysis
11 pages
Assignment Excelr
0% (1)
Assignment Excelr
9 pages
Lecture One 2025
No ratings yet
Lecture One 2025
81 pages
Markov Chains Homework Solutions
No ratings yet
Markov Chains Homework Solutions
2 pages
Statistics From Basics To Advanced
No ratings yet
Statistics From Basics To Advanced
25 pages
Discrete Probability Distributions
No ratings yet
Discrete Probability Distributions
2 pages
Comm-05-Random Variables and Processes
No ratings yet
Comm-05-Random Variables and Processes
90 pages
Random Variable Generation
No ratings yet
Random Variable Generation
5 pages
hw3 Solutions PDF
No ratings yet
hw3 Solutions PDF
11 pages
Shumway and Stoffer
No ratings yet
Shumway and Stoffer
5 pages
Midterm Review Spring18 Sols
No ratings yet
Midterm Review Spring18 Sols
22 pages
Solution CH # 5
No ratings yet
Solution CH # 5
39 pages
Data Analytics Lab: Car Leasing & Bottling
No ratings yet
Data Analytics Lab: Car Leasing & Bottling
5 pages
Statistics Assignment Guide
No ratings yet
Statistics Assignment Guide
10 pages
May 2021 Examination Diet School of Mathematics & Statistics ID5059
No ratings yet
May 2021 Examination Diet School of Mathematics & Statistics ID5059
6 pages
Statistics 131 Worksheet 10: Let X, · · ·, X ∼ U (0, θ), θ > 0. Find unbiased estimators of θ
No ratings yet
Statistics 131 Worksheet 10: Let X, · · ·, X ∼ U (0, θ), θ > 0. Find unbiased estimators of θ
2 pages
Probability Rules & Distributions Guide
No ratings yet
Probability Rules & Distributions Guide
3 pages
Lecture Notes Interpolation and Data Fitting
No ratings yet
Lecture Notes Interpolation and Data Fitting
16 pages
Decision Trees
No ratings yet
Decision Trees
25 pages
Week3 Logistic Regression Post PDF
No ratings yet
Week3 Logistic Regression Post PDF
110 pages
STP531 Course Syllabus Fall2013
No ratings yet
STP531 Course Syllabus Fall2013
2 pages
Binomial Distribution Explained
No ratings yet
Binomial Distribution Explained
16 pages
2023-24 AIML ML Mid-Semester Regular QP Anwer-Keys
No ratings yet
2023-24 AIML ML Mid-Semester Regular QP Anwer-Keys
4 pages
CS 229, Autumn 2017 Problem Set #2: Supervised Learning II
No ratings yet
CS 229, Autumn 2017 Problem Set #2: Supervised Learning II
6 pages
Taller 3 (A. NG.) - Introducción Al Aprendizaje Supervisado
No ratings yet
Taller 3 (A. NG.) - Introducción Al Aprendizaje Supervisado
8 pages
S23 Midterm2 Practice Problems Sol
No ratings yet
S23 Midterm2 Practice Problems Sol
42 pages
Ps 1
No ratings yet
Ps 1
25 pages
Introduction To Machine Learning IIT KGP Week 2
100% (1)
Introduction To Machine Learning IIT KGP Week 2
14 pages
UCI Math Club: Research Opportunities
No ratings yet
UCI Math Club: Research Opportunities
8 pages
Java R20
No ratings yet
Java R20
1 page
Notice For 1st Convocation Dt. 22.02.2024
No ratings yet
Notice For 1st Convocation Dt. 22.02.2024
1 page
Daily Lesson Plan Template
No ratings yet
Daily Lesson Plan Template
6 pages
Practical Book On Engineeirng Chemistry
No ratings yet
Practical Book On Engineeirng Chemistry
141 pages
Some Useful Nigerian Timbers, Their Destroying Agents and Measures For Their Prevention
No ratings yet
Some Useful Nigerian Timbers, Their Destroying Agents and Measures For Their Prevention
7 pages
Youth in Politics
No ratings yet
Youth in Politics
1 page
A Classical Vision of Masonic Restoration
No ratings yet
A Classical Vision of Masonic Restoration
15 pages
S - Daftar Pustaka - 12020114130133
No ratings yet
S - Daftar Pustaka - 12020114130133
4 pages
SSRN 4628041
No ratings yet
SSRN 4628041
6 pages
SDLC Glossary
0% (1)
SDLC Glossary
10 pages
NADE
No ratings yet
NADE
4 pages
Law Student's Guide to Case Studies
No ratings yet
Law Student's Guide to Case Studies
9 pages
bp343 Capstones
No ratings yet
bp343 Capstones
1 page
Design Thinking Reduces Cognitive Bias
No ratings yet
Design Thinking Reduces Cognitive Bias
14 pages
ECW1101 Unit Guide
No ratings yet
ECW1101 Unit Guide
15 pages
Year Gap Certificate
No ratings yet
Year Gap Certificate
2 pages
Hepr 1 Ps
No ratings yet
Hepr 1 Ps
16 pages
Quantity Surveying Log Book
No ratings yet
Quantity Surveying Log Book
17 pages
CPR Perspectives - Interview With Avani Kapur - CPR
No ratings yet
CPR Perspectives - Interview With Avani Kapur - CPR
16 pages
AI - A Two-Fold Machine Learning Approach To Prevent and Detect IoT Botnet Attacks - Paper - PLAGARISM
No ratings yet
AI - A Two-Fold Machine Learning Approach To Prevent and Detect IoT Botnet Attacks - Paper - PLAGARISM
10 pages
Webinar Slides
No ratings yet
Webinar Slides
30 pages
Mobile Tech in Rural Education
No ratings yet
Mobile Tech in Rural Education
9 pages
FEng Outlines 2016SEP
No ratings yet
FEng Outlines 2016SEP
3 pages
Heat Transfer Semester Exam QPs
100% (1)
Heat Transfer Semester Exam QPs
61 pages
English Profiency in Colleges Mainly URS in The Philippines
No ratings yet
English Profiency in Colleges Mainly URS in The Philippines
49 pages
Relationship Quality As A Predictor of B2B Customer Loyalty
No ratings yet
Relationship Quality As A Predictor of B2B Customer Loyalty
16 pages
Story1 Hungry Lesson Plan
No ratings yet
Story1 Hungry Lesson Plan
5 pages
Hook Surgery Practice Booklet PDF
No ratings yet
Hook Surgery Practice Booklet PDF
4 pages
The Big Five: Conscientiousness "Controllers"
No ratings yet
The Big Five: Conscientiousness "Controllers"
14 pages

Brain Cell Types & Clustering Analysis

Uploaded by

Brain Cell Types & Clustering Analysis

Uploaded by

6.

419x Module 2 Report

Problem 2: Larger unlabeled subset

Part 2: Unsupervised Feature Selection

You might also like