[go: up one dir, main page]

0% found this document useful (0 votes)
2 views8 pages

python assignment

This document presents an assignment submitted by Muhammad Ijaz Ahmad for a computer programming course at the University of Engineering and Technology Lahore, Narowal Campus. It includes Python code for generating random DNA sequences, creating DNA profiles in FASTA format, and comparing these profiles to a simulated crime scene sample, culminating in a histogram plot of similarity scores. The output section displays generated DNA sequences for 50 individuals.

Uploaded by

ijazahmadm28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views8 pages

python assignment

This document presents an assignment submitted by Muhammad Ijaz Ahmad for a computer programming course at the University of Engineering and Technology Lahore, Narowal Campus. It includes Python code for generating random DNA sequences, creating DNA profiles in FASTA format, and comparing these profiles to a simulated crime scene sample, culminating in a histogram plot of similarity scores. The output section displays generated DNA sequences for 50 individuals.

Uploaded by

ijazahmadm28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

UNIVERSITY OF ENGINEERING AND TECHNOLOGY LAHORE, NAROWAL CAMPUS

Assignment #2

Submitted to:
Mam Fatima Shahzadi

Submitted by:
Muhammad Ijaz Ahmad

Registration No#:
2023-BME-143

Course Name:
Introduction to computer programming for Data Science

Course Code:
CS-103

Department of Biomedical Engineering


Code is:
import numpy as np
import matplotlib.pyplot as plt
import random
# Function to generate a random DNA sequence
def generate_random_dna_sequence(length):
nucleotides = ['A', 'T', 'C', 'G']
return ''.join(random.choices(nucleotides, k=length))
# Function to generate a DNA profile in FASTA format for an individual
def generate_dna_profile(name, sequence):
fasta_format = f">{name}\n{sequence}\n"
return fasta_format
# Number of individuals and length of DNA sequence
num_individuals = 50
sequence_length = 100
# Generate DNA profiles for different individuals
dna_profiles = {}
for i in range(num_individuals):
name = f"Individual_{i+1}"
sequence = generate_random_dna_sequence(sequence_length)
dna_profiles[name] = sequence
# Write DNA profiles to a FASTA file
fasta_file = "dna_profiles.fasta"
with open(fasta_file, 'w') as file:
for name, sequence in dna_profiles.items():
file.write(generate_dna_profile(name, sequence))
print("DNA profiles have been generated and saved in 'dna_profiles.fasta'.")
# Data Loading and Preprocessing
dna_profiles_loaded = {}
with open(fasta_file, 'r') as file:
lines = file.readlines()
for i in range(0, len(lines), 2):
name = lines[i].strip()[1:]
sequence = lines[i+1].strip()
dna_profiles_loaded[name] = sequence
# Encode DNA sequences into numerical arrays
alphabet = ['A', 'T', 'C', 'G']
encoded_profiles = np.array([[alphabet.index(base) for base in sequence] for sequence in
dna_profiles_loaded.values()])
# Generate DNA sample for crime scene simulation
crime_scene_sample = generate_random_dna_sequence(sequence_length)
# Encode the crime scene sample into a numerical array
encoded_crime_scene_sample = np.array([alphabet.index(base) for base in crime_scene_sample])
# Sequence Comparison
similarity_scores = [np.sum(encoded_crime_scene_sample == profile) for profile in encoded_profiles]
# Plot the result on a histogram
plt.figure(figsize=(10, 6))
plt.hist(similarity_scores, bins=np.arange(sequence_length+1), color='skyblue', edgecolor='black')
plt.xlabel('Similarity Score')
plt.ylabel('Frequency')
plt.title('DNA Profile Similarity to Crime Scene Sample')
plt.show()

Output is:
>Individual_1

ATACAGGTCCCGTCAGAGAGTTCGCAATGCATCACATGAGAAACGTGGATTCGCATTCTGGCCATAAGATGGGGA
TACGCGAGAGATCACCCCGAATGAA

>Individual_2

CAACATGTAAGTGCAACTGGACCTGAGCGAGAAGTCGGTAGATCTGACAACCCAATCAGTCGTGCCCCAGACTCA
CAAACCCTAGGCCAGGCGGCGGTGA

>Individual_3

AAATACGCGCTGGTATTGCTTGGTATAGGCTTTGCGAGGACTCAAAGTTTTTTCGAGAGTTGCGGAGCATCCGCTT
GGTGCTCAATCGTAATACTTTTCC
>Individual_4

GAGTTAGACCACCGAGGTATCTCTGATACGTGAGAGATCTTAAACCGCGTTCCTGGGGGGTGACAGACTTCAGAG
ACTAGTAAGCAGACGCAGTCCGCGG

>Individual_5

ACCCGATGGAGTCGCGGGGCGGCCCCGGCGCACTGCGGCGGCTCCAAAGAGCCCCGGTCAGGGCAATGATTGACC
ATAGACTGGGTGCGTGGTACGTTCC

>Individual_6

CGGATACACGCCATACCTCAGTAAAAAGTTATGGCGATAAAATAAAGTGTACCCCTTTCCATCTTGTTACTCGCAA
GTTCCTTGAGGGAAAAAATACACT

>Individual_7

TGCCTCGCCCGCTATGCTAATAAGATTTGGCCCCCACAAGCTCATGTCTATCGTGAGGAACATCATCTGTTAGGAC
CGCACGCAAGAGGATATCCAAGAT

>Individual_8

CTCGTGATTTTATTAGTCTTATCCACTCAGGCTTTTGAGTATTTATTTAGGGTACCACGCGCTGCAGAGTTATTCCTG
AATCTAGCACGGTTTCAAGGAG

>Individual_9

CTTGCTCCAATAACATGTTGGCAACTAAATGAACCCGGAAAGCGCTTCTTGGCAGGGAGGGATTAAGGACTAACA
GCCTACATGATCCGAAACGGTTAAG

>Individual_10

GGTCTCGTCAGTTGACCGCCATCAACCCGGGTACTTAACACTTTCCGCGAACACTATGCACTGATCAATTGACCCA
TCTAGAGTGTACCAGATTTGAAAT

>Individual_11

CTGTTTTGTCGAACAAGAACAGTATAGGGCACCAACGAAGGCGACCAAGGGCGGCGGCCCTTCACGTATACACCC
AAGCACCGGAAGTTTAGTTCAGGGA

>Individual_12

AATACTGCCAGTCGGCTGGTTCGCGTTCTATAATTCTAAGACATGATAACCCACGAGAGGTTTTAACGGGCGTGGG
AACATCCAGGTATCGGACCCCTGG

>Individual_13

CCGCTTTCCGACGGAAGATCTAAAGTAAAACCCCTTCGATTCATGATGCTGATCCAAGCTACAAACTGACGTCACA
GCCGCTAGGGGAGGACTAACGGTT

>Individual_14

CATCCGTAACGCGTGGAACCGCAATGGTATTTTAGGGCGTTTAGCTAAATAGCAATTATGGCTGCGTACAATCAGT
TGTGCCGATAGTTCCAATGGCGTG

>Individual_15

TTTCGAGATACGTACCGAAATCGACGCTCTATTGCGTTTCAGTATGTGCCCTGTTCTGGGAAGCTATATCGACTAAA
TGTAGCGACATAAGATGTAACCG

>Individual_16

TTTTAACGGGGGAAGTACTCACCCGGACTTAGGATGCGATACAGGGGGGGATTCATGTTTCTAACCATGAGCGGTC
ACGTGTTGGACTGAGGGTAGGCCC
>Individual_17

GTACTGAGGCAATGGCCGCGTCCCCGTGCTCGACAGTTGAGCAGCAACATGCGTTTGAATCTGTGAAACATCTTGT
TTACGATACTGTTACTTATACCTA

>Individual_18

CGTTGACCTTTTCAGGATCCGTTCACGCTGGGTTTTAACGTGCGCGCTTTATAATGTGGAAGGCGGGGGGGAGTCC
CACACGAGCTATTTACTCCAACCT

>Individual_19

GGAGCGTCACGTCTATAGCTTATGTCCTACTGGGGTCGGCCCATGAAAGAGGAGTATCTATGCGTTCTATGGTAGA
TTCCACGCTAAGACTCGGCCATCA

>Individual_20

GCGTTCCTGTTTCTCCCCTCGACTGGGTAATGGAGCCGACCTGCGAGCTCACGTTATCTTAAAATGAGCTGGTTCCC
AGTACTAAGTCCGGCCGACGTGT

>Individual_21

CCACTGCAAGCAGTTCAAATGCTACCGTGGGAATCGGCACATTTTAGGAGAACTTTTGTGACATTTCGAGACTGTC
AAGCACCCACTGCAAATCAATTAT

>Individual_22

ATAAGTTTACGGAGCAATAGCGTCTACAAAAAATTCAATTGTGCAGGCCCGGTGAACCTCTAGTCGTACATGAACG
CAACGGGTATTGAAGCCGCAAGAT

>Individual_23

TCGATTCGCTATGCCCCATCCACCTCGCAAATAGTCGCGTCCTCGTATACTTACCTTAGACTACACGGAGGTCTAAC
CGTTACGACGTAGGCAATTCGTG

>Individual_24

CGCCCCTCGCTGTGTTTGTCAATAGGTAATTTTTTGAGAACCAACGGCCTTATACTATCTCGGCTATCACACGGTAG
AAGTGGAACTCACACCAGGAGTC

>Individual_25

TGACAGCTCCCCAGAAGAGACACCAGATGTCTATCTAAGTTGTTTAGTCTGTGCTAGTTCTACCAGTAAACATGAT
TGAAAAAACTATCTTATTCTTCTG

>Individual_26

TAATTTGCTGCGCGCGTGCCCCTCTACGTCGGATAAGTAATCAGATGGTACTCCAAGCGTAAATCACTTCTCCATTT
CTACCTTGGGGTCTGATATAGTC

>Individual_27

ATATGAATCAGCTGTTCTGGCTTGTAGTAACGAGGGGCCATATGGAAAGTATCCTGCCAAACGGCAGGTAGAAAT
CACAGCGTCCGAGCTTACCATGGTT

>Individual_28

TAGTTCAGACGGAAGGGGGGAAACTCCAAGGGTCCGCAAGTCCAAAATGAGCACGCATGCCCAACTATACTGCAC
ATAAACTCATGTTTCGCGCGTTCGC

>Individual_29

AACAGACGTTTGCTAAAAGTGCATAAAGTCGGACCGCGCTGATTTAGTACGCAGGCCGGAATGAGACATAACAGG
ACTACAAGACTCTACAACCCGAGAT
>Individual_30

CAGTTTCAATAAGATAAGGCCAGTCTAATGGAGAATGAATCTGCAACTCCCTAAAGAAGTGGTGGCGCACGTCCG
TGGAGGCCAGCGCCCTAGAGATATC

>Individual_31

TTCTCCCCTAGCAGTAAATTCTTCAGACACCAGCTGGTACCTTAGTGAAGTAAAAGTGAACTTTCATTTGGTTACAG
CCCGGCAAGGATATACGGCTGAG

>Individual_32

AACTATATAGGAAGTCTCAGCCACCACAAAAGTTACGGGCAGCGGGGGTGCTCCGTCAAGTCCGAGAAGGCGAAT
AGCGCTGATAATTTATGTCACACTC

>Individual_33

AGGGAGGAAGTCCGAGAAAACAGTAATAAATACCTCGGGCGTAATAGATAAAGTACAAGCAATTGTCGTTAGTCA
ATCGATTGCGTGGTGGAGACTGCCG

>Individual_34

AGCTGGCCGGGCATGCGTTGCGTGGCCAAGCTTAGTACTCATGTAGCCGACGGAGCCTCATCAAACTGAGCGTCA
ATTAGTTGGAGGGTTGTAGTTAATA

>Individual_35

AGGTAGAGCTCACACAGGTATAAAGTGCTCAGTCAAAGGCAGGCCATAATCGCGGACGATTAATACCCATATCCA
TGCGAGTCCGTGGAGAGATCGTACA

>Individual_36

CAAATCTATGGGCCACCTAGCATAGCACACTGAAGAACGCGCATAAAGGTAGATACTAAGCGTTTATGGGATGTT
TTCGGGTTAGCGGCTAACTCATAAA

>Individual_37

GACTTCGAGCAAGACGCACTTAATCGATAATTGCCGCCTGTTTGGGTGCTCGATACATTGGGTCACACGCCTCCAT
GGGGAGTCGTGAGCGAAGGTCTGG

>Individual_38

CCCGATATTGGTTTTAAGCTCCACCCTTCACACGAGGTGCTAGAACCGTAGAGTTCCCTCGTATAGAGCTTTAAGT
GATTGAGATTAGGTGGAGGATTCC

>Individual_39

TTGGCTATCTGGACTTTATCTAAGTCTGAACGCTATCTAGATTGTATTTGCGCGGATTGAATGCAATCTCCGAATAG
ATGGCGCAGTGGGCAAAATACCC

>Individual_40

AGTACCCAAAATGTACGCCGACCACACCAGAAAAGTTTAGACTTGTTATCAGATACTTCGCAGGATTGGTGGGAC
GACACTCGCACGTTGTAATTTCTCT

>Individual_41

ATCTGATCATATTTTTGACTGTGAACGTATATACGTCCAGTCGCTTGGTTTATATTCTCCACGTCGCTTTGGCTCCAC
AGGCCGACTCGTAGTTCGGTGT

>Individual_42

CTTGGTTGTGGTCCGCCAAGGACATCTCACGCTCCAAGAGATGGGACTGGCCAATAGGCGCAGACAAAACTTCTC
CACCTCCAGCTTAAGGCAGGATCTT
>Individual_43

GGTACCAAGTTATCGTGTGTGTTCGCTAGGAAATTAGTTTTTAGTCATCGACAGGATTGCGTCCTACACTATTATGA
AAACAAGTCATAGGGCATAGCAC

>Individual_44

GATCCACCAATACTACCCATAGTATCAGTCCAACGCTCTCCCACAACGCGGGCTTGGGGGCACTTACGAGAGGGG
GTGGAACTAGGGGCTATCCTCGAAA

>Individual_45

TCCGCTGATAAATAACGACGTCAAGAAAAGCTCTGAAATATCCCTAAGTATTAATATGTTTTTGAGCTCTACGGCC
ATCAGTCACCCTACTAGCGTAGTC

>Individual_46

CTTCCCAAAAACAAACCTGTAAAGGCTCTTTCCGCTATAAGCGGATCTTCTTAAGAATTGTTCAGTCGCGCTTCACC
ATCTGCTGTGTATTACCAAAGTC

>Individual_47

TCTATACAATAACTGTGGCGTAGCAATACACTGACAAGCGGCTTTTTATGTAGCGGTCGGGCTTCCCTACTCAGAT
TGCCAGTAAATTACGGTCGTCTGT

>Individual_48

CTTGCATTCGCGCTTGTATGCTCTCTGTACTGATGAGAAAGGTCCCATCCTAGTGGCGCGGTACGTGTTTTCAGCCG
AAAAGGCCGTAACGTGGGGACGA

>Individual_49

TAGAGGCCCAGCAAACGTACTCCCTGTAGGTTACTGACGGACTCGCGTATGGCGATGCTTTGTAGGCATATTCCGT
AATCTTGATTAGAGTTGGTAAATA

>Individual_50

CAGAACATGAAACGGCGGCCAGAGAACGTACGGTCGCTTAGCAGACGCCCAGTTGGATTGTCGTAGAAACAGACG
CGCGATGTCCTTAGCCACAGGTTAC

Graph is:

You might also like