USH2A SNPs

From genomewiki
Jump to navigationJump to search

USH2A

Usherin (USH2A), a 71-exon gene located on human chromosome 1q41], encodes a 5202 residue multi-domain protein comprised of a signal peptide, a PDZ1 binding domain (for USH1C and WHRN), 1 laminin NT-terminal domain, 10 laminin EGF-like domains, 4 fibronectin type-III domains (for collagen IV and fibronectin), and 2 laminin G-like domains followed by 31 additional fibronectin type-III domains all tethered to the cytoplasmic exterior by a single transmembrane domain.

USH2Adomains.jpg

The usherin gene is expressed in the basement membrane of many (but not all) cell types, notably in ear interstereocilia ankles and below retinal pigment epithelial cells (Bruch's layer) and indeed in photoreceptor cells themselves in the connecting ciliary collar. When normal function is disrupted by mutations in both copies, non-vestibular sensorineural deafness and degeneration of retinal photoreceptor cells called Usher syndrome type IIA results.

In Usher Syndrome 2A, children are born not deaf but hard-of-hearing, able to detect lower tones better than higher frequencies. Only 10 dB of further loss occurs the next several decades -- it is non-progressive because usherin plays its role in kinocilia, a developmental intermediary to cochlear stereocilia.

Mid-periferal vision loss (of rods) has much later onset but can be compensated for eye scanning utilizing healthy parts of the retina (where acuity can still be 20/20). The field of vision narrows considerably over time but can still be compensatable. Usherin is not expressed in rod outer segments so not directly involved in photoreception; rather, it plays a structural role in the inner segment collar system at the base of cilia and so in the transport of replenishment vesicles, critical to maintainenc of vision because rod outer segments (and their 70 million rhodopsin molecules) are completely phagocytized by RPE every ten days.

Initially, only the first 21 of 71 exons were studied but later it emerged that the gene is much longer and exquisitely sensitive to certain point mutations along the entire length of the protein (54 known sites), all leading -- in the homozygous or compound state -- to essentially the same disease: 125, 163, 230, 268, 303, 334, 346, 352, 478, 536, 595, 644, 713, 759, 1212, 1349, 1486, 1572, 1665, 1757, 2080, 2086, 2106, 2169, 2238, 2265, 2266, 2292, 2562, 2875, 2886, 3088, 3099, 3115, 3124, 3144, 3199, 3411, 3504, 3521, 3590, 3835, 3868, 3893, 4054, 4115, 4232, 4433, 4439, 4487, 4592, 4624, 4795, 5031.

Many other coding variants are known as well (non-disease SNPs). This article evaluates a particular new SNP in USH2A using comparative genomics but the methods here are applicable to any coding variation in any fibronectin domain. This mutation occurs as a non-hotspot G-->A transition causing a seemingly innoculous S-->N amino acid change at postion 3743. This is just downstream from a glycosylation motif and very near known FN3 interdomain contact residues and a cytokine receptor motif (according to blastp CDD]). This residue lies at a conserved turn initiating the seventh beta strand in the 22nd fibronectin domain which is split across portions of exons 55-57.

USH2AFN3.jpg


This change will be shown significant (not plausibly neutral). S3743N could represent an adaptive innovation but is more likely deleterious. The gene is single-copy so there are no prospects for a paralogous copy taking over its function, though conceivably compensation could occur via interacting proteins of the Usher complex. Consequently the mutation, if present on both alleles or as a compound mutation, could well result in a new form of Usher syndrome type IIA.

Background

USH2A, while resembling just another shuffle of ubiquitious domains, has potential antecedents in pre-blatera -- very long matches can be recovered from sea anemone and hydra that are more than just concatenated fibronectin domains. Neither site of expression nor normal function has been established in cnidaria. The gene seems completely absent in all ecdysozoa and lophotrochozoa. It's not clear whether the need for a basement membrane organizer has been lost or whether some non-paralogous gene has taken over USH2A's role in these clades.

Sea urchin has a predicted gene XM_001179293 with many similarites: 33 fibronectin domains of type FN3, 10 EGF/laminins, and 3 concanavalin-like domains (which correspond to laminin G at SCOP). The curious pattern seen in human of 10 EGF/laminin domains followed by 4 FN3 domains followed by 2 EGF/laminin domains and completed by the rest of the FN3 domains is also seen in echinoderm. The blastp alignment to human usherin extends for over 5,000 residues without major gaps.

Further, when each human FN3 domain is individually aligned to its best-blastp FN3 domain in sea urchin, astonishingly the order is perfectly conserved for all 35 domains with the exceptions of 31 and 34 (which have indels throwing off their top scores). Thus even though it seems that the number of FN3 domains might experience internal tandem repeat gain and loss, this is not the case despite nearly a billion years of roundtrip divergence time. It follows that many details of binding to other proteins must also be conserved in these orthologs.

While primary sequence identity is low, the correspondence suggests the modular structure is exceedingly conserved, both in order and number of domains. While sea urchins are not noted for eyes or ears -- and known transcripts are reported from early blastula, gastrula and pluteus stages -- the basics of USH2A functionality must already have been locked in at the common ancestor.

Focusing now on human fibronectin FN3 domains (one envelopes S3743), these are an ancient and exceedingly common domain in bilaterans with 2% of the human proteome containing them (400 genes), often in multiple tandem copies having a role in cell adhesion. However they are not particularly well conserved in primary sequence, though the tertiary structure likely holds up well enough for the structure at residue 3743 to be determined with both serine and asparagine present.

Here the best blastp match elsewhere within the human proteome to the FN3 domain containing residue 3743 is a fibronectin domain of PTPRQ, a dimly related protein tyrosine phosphatase with merely 28% of the fibronectin residues matching.

Internally, the best match to the other 30 FN3 domains of USH2A is not noticably better, suggesting very substantial divergence since these domains duplicated from a common source (either as internal tandems or domain shuffles). If as suggested above, USH2A had already assumed its contemporary domain structure in pre-bilateran metazoa, ample time has passed to produce the observed divergences between the individual domains.

While comparative genomics of intron positions and phases in a 71-exon protein are tedious to curationally pursue, the fibronectin domain containing S3743 falls across parts of three exons, whose phases are 12 and 21. This suggests that subsequent to the ancient intronation era, simple internal tandem duplications might not result in either a coherent reading phase or domain. Thus the domain structure of USH2A, while appearing somewhat arbitrary in its FN3 multiplicities, actually may be quite constrained by intronation against both contraction or expansion (in addition to whatever individual functional domain constrains exist).

As can be seen below, the internal fibronectin repeats are most often T threonine at the position corresponding to S3743 though other residues, not including the asparagine of S3743N, also occur. Here the numbering of better matches within the full length protein indicates they do not always correspond in match quality order to the linear order of the FN3 repeat within the protein.

FBN22 1     WSLPEKPNGLVSQYQLSRNGNLLFLGGSEEQNFTDKNLEPNS
            WSLPEKPNGLVSQYQLSRNGNLLFLGGSEEQNFTDKNLEPNS
FBN.. 3702  WSLPEKPNGLVSQYQLSRNGNLLFLGGSEEQNFTDKNLEPNS

FBN22 1     WSLPEKPNGLVSQYQLSRNGN-LLFLGGSEEQNFTDKNLEPNS
            WS+PEK NG++ +YQ+ + G  L+    ++ +  T   L+P +
FBN.. 3610  WSVPEKSNGVIKEYQIRQVGKGLIHTDTTDRRQHTVTGLQPYT

FBN22 1     WSLPEKPNGLVSQYQLSRNGNLL-FLGGSEEQNFTDKNLEPNS
            W  PE+ NG++  Y+L RN  L  F       N+TD+ L P S
FBN.. 4285  WIPPEQSNGIIQSYRLQRNEMLYPFSFDPVTFNYTDEELLPFS

FBN22 1     WSLPEKPNGLVSQYQLSRNGNLLFLGGSEEQNFTDKNLEPNS
            W  P K NG+++ Y +  +G L         N T  +L P + 
FBN.. 2553  WQHPRKSNGVITHYNIYLHGRLYLRTPGNVTNCTVMHLHPYT

FBN22 1     WSLPEKPNGLVSQYQLSRNGNLLFLGGSEEQNFTDKNLEPNS
            W  P  PNG +  Y+L R+G +++ G   E  + D  L P  
FBN.. 4464  WKPPRNPNGQIRSYELRRDGTIVYTG--LETRYRDFTLTPGV

FBN22 1     WSLPEKPNGLVSQYQLSRNGNLLFLGGSEEQNFTDKNLEPNS
            W+ P+K NG+++QY L  +G L++ G   E+N+T  +L   + 
FBN.. 2075  WNPPKKANGIITQYCLYMDGRLIYSG--SEENYTVTDLAVFT

FBN22 1     WSLPEKPNGLVSQYQLSRNGNLLFLGGSEEQNFTDKN-LEPNS
            W  P + NG +  Y L RNG   F G S   +F+DK  ++P   
FBN.. 3521  WRKPIQSNGPIIYYILLRNGIERFRGTS--LSFSDKEGIQPFQ

FBN22 1     WSLPEKPNGLVSQYQLSRNGNLLFLGGSEEQNFTDKNLEPNS
            W+ P  PNG+V++Y +  N  L   G +   +F  ++L P +
FBN.. 3040  WTSPSNPNGVVTEYSIYVNNKLYKTGMNVPGSFILRDLSPFT

FBN22 1     WSLPEKPNGLVSQYQLSRN-------GNLLFLGGSEEQNFTDKN--LEPNS
            W  P  PNGLV  + + R          L+ L  S    F DK   L P +
FBN.. 2644  WQPPTHPNGLVENFTIERRVKGKEEVTTLVTLPRSHSMRFIDKTSALSPWT

FBN22 1     WSLPEKPNGLVSQYQLSRNGNLLFLGGSEEQNFTDKNLEPNS
            WS P + NG++  Y +  +G L + G + +  F  + L+P + 
FBN.. 4087  WSEPMRTNGVIKTYNIFSDGFLEYSGLNRQ--FLFRRLDPFT

FBN22 1     WSLPEKPNGLVSQYQLSRNGNLLFLGGSEE----QNFTDKNLEPNS
            WS P+ PN     Y L R+G  ++    +     Q F D +L P +
FBN.. 1074  WSPPDSPNAHWLTYSLLRDGFEIYTTEDQYPYSIQYFLDTDLLPYT

FBN22 1     WSLPEKPNGLVSQYQLSRNG------NLLFLGGSEEQNFTDK--NLEPNS
            W  PEKPNG++  Y + R        ++LF+       F D+   L P + 
FBN.. 3887  WMPPEKPNGIIINYFIYRRPAGIEEESVLFVWSEGALEFMDEGDTLRPFT

FBN22 1     WSLPEKPNGLVSQYQLSRNGNLLFLGGSEEQNFTDKNLEPNS
            WS P   NG +++Y L R  N   L G +       +L+P S
FBN.. 4376  WSPPTVQNGKITKY-LVRYDNKESLAG-QGLCLLVSHLQPYS

FBN22 1     WSLPEKPNGLVSQYQL--------SRNGNLLFLGGSEEQNFTDKNLEPNS
            W+ P +PNG V  Y+L         R  N + +      +F D  L P + 
FBN.. 4657  WTGPLQPNGKVLYYELYRRQIATQPRKSNPVLIYNGSSTSFIDSELLPFT

FBN22 1     WSLPEKPNGLVSQYQL------SRNGNLLFLGGSEE----QNFTDKNLEPNS
            W  P + NG +  Y L       R   ++ +  +      Q++    L+P  
FBN.. 4552  WDPPVRTNGDIINYTLFIRELFERETKIIHINTTHNSFGMQSYIVNQLKPFH

FBN22 1     WSLPEKPNGLVSQYQLSRN-------GN--------LLFLGGSEEQN---FTDKNLEPNS  42
            WS P  PNG + +Y++ R        GN        ++F   + E+N   + D  L+P +
FBN.. 4175  WSEPVNPNGKIIRYEVIRRCFEGKAWGNQTIQADEKIVFTEYNTERNTFMYNDTGLQPWT  4234

Pseudogene issues

Long isoform USH2A transcripts are over 15,000 bp in length. Consequently position 3743 is not even represented in the set of all human direct transcripts. Even should a retrogene arise from retropositioing, it is unlikely that the process would extent upstream so many exons. Unsurprisingly no processed pseudogenes are evident in any mammalian genome (tblastn of wgs division of GenBank). Thus no potential for confusion exists in locating orthologs of USH2A even in distant species with incomplete genomes.

Paralog issues

No close paralog exists in the human proteome according to the UCSC GeneSorter track. The nearest matches are to other proteins containing laminin or fibronectin domains. No potential for confusion with other genes exists within vertebrates; however comparative genomics at and before teleost fish divergence needs more careful treatment because of whole genome and domain expansion.

Tandem domain repeat issues

In proteins with multiple copies of a given domain, both expansion and contraction can occur over evolutionary timescales resulting in different numbers of repeats in different clades. Under these circumstances it can be difficult to establish orthologs of a given domain. However here the fibronectin domains diverged early on and the 22nd domain seems to be present in all vertebrates with genome projects as a single-copy domain (meaning here no recent duplications or losses).

The alignment of fibronectin domains in human USH2A shows pockets of conservation (notably LEPNSRY about S3743 in the 22nd FN3 domain) and certain conserved anchor residues but on the whole is mediocre due to gaps necessitated by length differences. The second alignment -- just of FN3 domains contained verified pathogenic mutationv -- shows these sites are highly correlated with conserved residues (8 of 24 are represented, two sites multiple times in separate FN3 domains). Possibly some of the best conserved sites cannot be mutated without much more far-reaching effects on all tissues in which USH2A is expressed.

Analysis of the full set fibronectin domains somewhat strengthens the case for S3743N pathogenicity (it lies in a conserved patch with two nearby sites proven pathogenic and similar hydroxyl T is most abundant residue here and in deeper phylogeny) but not overwhelmingly (the 8th domain has N at homologous position).

Some scepticism is in order for pathogenicity of Y4487C and Q4592H in the 30th and 31st FN3 domains in view of their position in apparently unconstrained loop positions with no observed interdomain conservation. Yet tblastn of both at GenBank wgs shows remarkable phylogenetic conservation (data not shown) within each individual repeat, similar to domain 22. This could be pursued for the other positions falling outside the conserved and semi-conserved residues characterizing fibronectins.

The issue here is disentangling universal from individual fibronectin conservation issues. The effect of interchanging order of domains, like that of deleting or adding domains, has not been studied. Substitutability of genes across species (human for sea anemone?) would help define the rigidity of FN3 subunit requirements.

What is needed here online alignment software that accept a fasta sequence array (in effect a simple relational db) and outputs something whose rows are individual Logos. More simply, it could use a faster header naming scheme to collapse a conventional alignment to the index species. Many human genes have internally repeated domains and many others have full-length paralogs (which can be treated like tandem repeats). Evaluating coding SNPs is a huge issue in genomic medicine and even slight improvements in forecasting could have significant benefits.

FN3align.jpg

FN3patho.jpg

Comparative genomics of vertebrate USH2A

The alignments below show the orthologous exon from 46 species. While no variation at S3743 occurs at any mammal or bird USH2A, lizard is possibly anomalous with asparagine in its best matching FN3 domain as are some fish with arginine and early-diverging deuterostomes and cnidaria with threonine.

However the lizard situation is bioinformatically uncertain because the the 3 exons centering on 3743 are missing from the UCSC genome assembly upon whole USH2A blat, whereas the best matching domain is present in AAWZ01000661 upon tblastn at wgs, with the asparagine supported by 4 raw trace reads. The putative relevent exon itself is unexpectedly diverged, causing it to fall to the bottom of the alignment tree in conflict with phylogenetic position. It further has an unusual one residue deletion 6 amino acids prior to 3743.

Consequently the Anolis feature may not represent the orthologous exon of a functioning gene copy. However it provides support for the idea that some fibronectin domains in some species can tolerate asparagine at paralogous position. Thus while N3743, if valid, detracts only mildly from story of invariant S3743 (with T3743 tolerated), the divergence time with mammals is some 310 myr ago.

The arginine anomaly in four telost fish but not zebrafish cannot be read or assembly error. S3743R is not at all a conservative substitution. Parsimoniously, it represents a single event in a late diverging clupeomorph fish, Since it has persisted in descendent lineages, it may represent adaptive change. Note shark has S3743, as do amphioxus and sea urchin. Lamprey genome is incomplete here.

In summary, S3743 has been fixed for billions of years of branch length within mammals and beyond. The reduced alphabet here is very restricted (outside of rapidly evolving teleost fish) with T3743 probably ancestral and nearly the full extent of admitted variation. Note asparagine codons, like threonine, lie a single base transition away and so experience no need for two mutational steps and the consequent intermediate barrier. This implies a small amino acid with hydroxyl at this position is critical to proper functionality of USH2A. Hydrogen-bonding capability (eg asparagine) is likely not sufficient in a substituent for serine.

Thus S3743N, though it could be an adaptive functional innovation, is most likely a maladaptive mutation. The symptoms of Usher Syndrome 2A are the likeliest outcome in the homozygote given the situation at the many known other disease alleles, though the penetrance and age of onset remains unpredictable.

As a cautionary note, a distinction must be observed between significant impact to normal function and significant impact to fitness. For example, sickle-cell hemoglobin evidently disrupts normal protein function, yet it adds to fitness (malarial resistance) in the heterozygote. Here allele population statistics are illuminating. Prion disease complicates that: amyloid and dementia surely does not add to either normal function nor fitness yet age of onset of familial CJD is so late that harmful alleles rarely came into play during lifespans typical of almost all of human evolution. This has allowed certain lethal alleles to attain substantial frequencies through founder effect and drift.

             ............................................................^. hatch marks S3743 site
USH2A_homSap GVWVTPRHIIINSTTVELYWSLPEKPNGLVSQYQLSRNGNLLFLGGSEEQNFTDKNLEPNSR
USH2A_panTro GVWVTPRHIIINSTTVELYWSLPEKPNGLVSQYQLSRNGNLLFLGGSEEQNFTDKNLEPNSR
USH2A_gorGor GVWVTPRHIIINSTTVELYWSLPEKPNGLISQYQLSRNGNLLFLGGSEEQNFTDKNLEPNSR
USH2A_ponAbe GVWVTPRHIIINSTTVELYWSLPEKPNGLISQYQLSRNGNLLFLGGSEKQNFTDKNLEPNSR
USH2A_nomLeu GVWVTPRHIIINSTTVELYWSLPEKPNGLISQYQLSRNGNLLFLGGSEEQKFTDKNLEPNSR 
USH2A_macMul GVWVTPRHIIINSTTVELYWSLPEKPNGLISQYQLSRNGNLLFLGGSEEQNFTDKNLEPNSR
USH2A_calJac GVWVTPRHIIINSTTVELYWSLPEKPNGLISQYQLSRNGNLLFLGGSEEQNFTDKNLQPNSR
USH2A_tarSyr GVWVTPRHIIINSTTVELYWSPPEKPNGLISQYQLSRNGTLLFLGGSEEQNFTDKNLEPHSR
USH2A_micMur GVWVTPRHIIINSTTVELYWSPPEKPNGLISQYQLRRNGTLLFLGGSEEQNFTDKNLEPNSR
USH2A_tupBel GVWVTPRHIIINSTTVELYWSLPKKPNGLISQYQLSRNGTLLFLGGSEEQNFTDKNLEPDSR
USH2A_musMus GVWVTPRHIIINSTTVELYWNPPERPNGLISQYQLRRNGSLLLVGGRDNQSFTDSNLEPGSR
USH2A_ratNor GVWVTPRHIIINSTTVELYWNPPERPNGVISQYRLRRNGSLLLVGGRDDQSFTDKNLEPNSR
USH2A_dipOrd GVWVTPRHIIINSTAVELYWSPPEKPNGLISQYQLSRNGSVLFLGGREEQMFTDTNLEPNSR
USH2A_cavPor GVWVTPRHTVINSTSVELYWSPPEKPNGLISQYRLSRNGTLLFVGGGEEQNFTDKHLEPNSR
USH2A_speTri GVWVTPRHMIINSTTVELYWSPPEKPNGLISQYQLSRNGTLLLLGGSEERNFTDKHLEPNSR
USH2A_oryCun GVWVTPRHIIINSTTVELYWTPPEKPNGLISQYQLNRNGIVVFLGGSKEQNFTDRNLKPNSR
USH2A_ochPri GVWVSPRHIVINCTAVILYWSPPEKPNGIISQYQLIRNETVLYLGSGKEQNFTDGNLEPNSR
USH2A_vicPac GVWVTPRHIIINPTTVELYWSPPEKPNGLISQYQLSRNGTLVFLGGSEEQNFTDKNLEPNSR
USH2A_susScr GVWVTPRHIIVNSTTVELYWSLPEKPNGLISQYQLSRNGTVVFLGGSEERNFTDKNLEPNSR
USH2A_turTru GVWVTPRHIIINSTTVELYWSLPEKPNGLISQYQLSRNGSLVFLGGSEEQNFTDKNLEPNSR
USH2A_bosTau GVWVTPRHIVVNSTTVELFWSPPEKPNGLVSQYQLSRNGSLIFLGGSEEHNFTDKNLEPNSR
USH2A_equCab GVWMTPRHIIINSTTVELYWSPPENPNGLISQYQLSRNGTLVFLGGSEEQNFTDKNLEPNSR
USH2A_felCat GVWVTPRHIIINSTTVELYWSPPEKPNGLISQYQLSRNGTLVFLGGNEEQNFTDKNLEPNSR
USH2A_canFam GVWVTPRHIIINSTTVELYWNPPEKPNGLISQYQLSRNGTLVFLGGSEEQNFTDKNLEPNSR
USH2A_myoLuc GVWATPRHIIINATAVELYWRPPERPNGLISRYQLIRNGTSVFLGGSEDQHFTDHNLAPNSR
USH2A_pteVam GVWVTPQHIIINSTAVELCWSPPEEPNGLISQYRLSRDGNLVFLAGAEEHCFTDKNLEPNSR
USH2A_loxAfr GVWLTPRHIIINPTTVELYWSQPEKPNGLISRYHLRRNGTLVLLGGSEEQNFTDKNLEPNSR
USH2A_proCap GVWMTPRHIVINSTTVELHWSLPEKPNGHISQYRLRRNGTLVFQGGGEEQNFTDTNLEPNSR
USH2A_dasNov GVWVTPGHIIINSTTVELYWSQPEKPNGLISHYQLSRNGTLIFLAGREEQSFTDKNLEPNSR
USH2A_choHof GVWVTPQHIIINSTTVELYWSQPEKPNGLISQYQLSRNGTSVFQGGREEQHFTDKNLEPSSR
USH2A_monDom GVWSIPRHIIINSTTVELYWNEPEKPNGLISKYQLHRNGTVIFLGGREDQNFTDDSLEPKSS
USH2A_ornAna GVWSKPQHITVSSTTVELYWSQPEKPNGVISQYRLIRNGTEIFAGTRDSLNFTDDSLESNSR
USH2A_galGal GVWPKPHHIIVSSTEVEIYWSEPEIPNGLITQYRLFRDEEQIFLGGSRDLNFTDVNLQPNSR
USH2A_taeGut GVWPKPHHIIVSSTEVEMYWSEPEEPNGLITHYRLFRDGEQIFLGGSTARNFTDVNLQPNSR
USH2A_anoCar GVWSQPRHVIVSSKIVELYWDEPEEPNGIISLYRLFRNGEEIFMGGELNLNFTD-TVQPNNR 4 traces, not in assembly
USH2A_xenTro GVWSNPYHVTINESVLELYWSEPETPNGIVSQYRLILNGEVISLRSGECLNFTDVGLQPNSR
USH2A_tetNig GVWSKPRHLTVNASAVELHWDPPQQPQGLVSQYRLKRDGRAVFTGDHLQRNYTDAGLQPQRR
USH2A_takRub GVWSKPRHLIVTTAVVELYWDPPQQPHGHISQYKLKRDGQTVFTGDHDDQNYTDTGLRPHRR
USH2A_gasAcu GVWSSPRHVVINTSAVELYWDQPLQPNGHISQYRLNRDGDTIFTGDHREQNYTDTGLLPNRR
USH2A_oryLat GVWSKPRHLIINTSAVELYWDQPSQPNGLISQYRLIRDGLTVFTGARRDQNYTDTGLEPKRR
USH2A_danRer GVWSMPRHIQLNSSAVELHWSDPLKLNGLLSGYRLLRDGELVFTADGGKMSYTDAGLQPNTR
USH2A_calMil GIWPKPCHVIVNSSTVELYWTEPEKPNGIITQFRLLRDNAVIYTGTRRNRNYTDAGLQPDTR
USH2A_braFlo QEVSRPRFVVVSSTEIEVYWSEPGRPNGIITQYQLVRDGSVIYSGG--DMNFTDSGLTPSTT XM_002214612 aligns over 2807 aa
USH2A_strPur EGLMQPTHVVVSSTILELYWFEPSQPNGVITSYILYRDDELVYSGNNSVLTYVDTGLTPNTR XM_788345  aligns over 5030 aa
USH2A_nemVec SQQPAPVITVSSSRRLDLAWSPPDNPNGIILRYELYRNGTEVYRG--VIRGYNDTNLQPDTL XM_001638773 aligns over 3005 aa
USH2A_hydMag SQQGAPFVLFQTSRLINIGWFPPDNLNGILIKYELYRDRTKIFVG--LDNNYTDNNLKPYTY XM_002165140 
             ............................................................^.
 USH_homSap  GVWVTPRHIIINSTTVELYWSLPEKPNGLVSQYQLSRNGNLLFLGGSEEQNFTDKNLEPNSR
 USH_panTro  ..............................................................
 USH_gorGor  .............................I................................
 USH_macMul  .............................I................................
 USH_calJac  .............................I...........................Q....
 USH_ponAbe  .............................I..................K.............
 USH_nomLeu  .............................I....................K...........
 USH_turTru  .............................I.........S.V....................
 USH_tupBel  .......................K.....I.........T...................D..
 USH_susScr  ..........V..................I.........TVV.......R............
 USH_tarSyr  .....................P.......I.........T...................H..
 USH_micMur  .....................P.......I.....R...T......................
 USH_vicPac  ............P........P.......I.........T.V....................
 USH_felCat  .....................P.......I.........T.V....N...............
 USH_canFam  ....................NP.......I.........T.V....................
 USH_equCab  ...M.................P..N....I.........T.V....................
 USH_bosTau  .........VV.......F..P.................S.I.......H............
 USH_cavPor  ........TV....S......P.......I...R.....T...V..G........H......
 USH_speTri  ........M............P.......I.........T..L......R.....H......
 USH_oryCun  ....................TP.......I.....N...IVV.....K......R..K....
 USH_dipOrd  ..............A......P.......I.........SV.....R...M...T.......
 USH_loxAfr  ...L........P........Q.......I.R.H.R...T.VL...................
 USH_dasNov  ......G..............Q.......I.H.......T.I..A.R...S...........
 USH_choHof  ......Q..............Q.......I.........TSV.Q..R...H........S..
 USH_proCap  ...M.....V........H.........HI...R.R...T.V.Q..G.......T.......
 USH_musMus  ....................NP..R....I.....R...S..LV..RDN.S...S....G..
 USH_ratNor  ....................NP..R...VI...R.R...S..LV..RDD.S...........
 USH_myoLuc  ...A........A.A.....RP..R....I.R...I...TSV......D.H...H..A....
 USH_pteVam  ......Q.......A...C..P..E....I...R...D...V..A.A..HC...........
 USH_monDom  ...SI...............NE.......I.K...H...TVI....R.D.....DS...K.S
 USH_ochPri  ....S....V..C.A.I....P......II.....I..ETV.Y..SGK......G.......
 USH_ornAna  ...SK.Q..TVS.........Q......VI...R.I...TEI.A.TRDSL....DS..S...
 USH_galGal  ...PK.H...VS..E..I...E..I....IT..R.F.DEEQI.....RDL....V..Q....
 USH_taeGut  ...PK.H...VS..E..M...E..E....ITH.R.F.D.EQI.....TAR....V..Q....
 USH_xenTro  ...SN.Y.VT..ESVL.....E..T...I....R.IL..EVIS.RSG.CL....VG.Q....
 USH_anoCar  ...SQ...V.VS.KI.....DE..E...II.L.R.F...EEI.M..ELNL....-TVQ..N.
             ............................................................^.

Fasta sequences for the 35 fibronectin domains of USH2A are shown below (as delineated at SwissProt]). Those containing a mutation known to give rise to USH2A Syndrome contain * in their header -- some are found in patients but are of uncertain pathogenicity. The mutation itself is flanked by spaces in the fasta sequence itself for readability. There are 22 known sites in 18 different fibronectin domains according to this analysis. Note still other pathogenic mutations occur interstitially (between FN3 domains).

Fibronectin domains in human:

>01*1058-1143 86aa P1059L uncertain pathogenicity
P P PRGQVQSSSAINLSWSPPDSPNAHWLTYSLLRDGFEIYTTEDQYPYSIQYFLDTDLLPYTKYSYYIETTNVHGSTRSVAVTYKT

>02*1145-1238 94aa P1212L
PGVPEGNLTLSYIIPIGSDSVTLTWTTLSNQSGPIEKYILSCAPLAGGQPCVSYEGHETSATIWNLV P FAKYDFSVQACTSGGCLHSLPITVTT

>03:1242-1357 116aa
PPQRLSPPKMQKISSTELHVEWSPPAELNGIIIRYELYMRRLRSTKETTSEESRVFQSSGWLSPHSFVESANENALKPPQTMTTITGLEPYTKYEFRVLAVNMAGSVSSAWVSERT

>04:1367-1462 96aa
PPSVFPLSSYSLNISWEKPADNVTRGKVVGYDINMLSEQSPQQSIPMAFSQLLHTAKSQELSYTVEGLKPYRIYEFTITLCNSVGCVTSASGAGQT

>05:1871-1949 79aa
GAVVNLASVSSGAVRVNLDGCLSTDSAVNCRGNDSILVYQGKEQSVYEGGLQPFTEYLYRVIASHEGGSVYSDWSRGRT

>06:1954-2051 98aa
PQSVPTPSRVRSLNGYSIEVTWDEPVVRGVIEKYILKAYSEDSTRPPRMPSASAEFVNTSNLTGILTGLLPFKNYAVTLTACTLAGCTESSHALNIST

>07.2052-2138 87aa
PQEAPQEVQPPVAKSLPSSLLLSWNPPKKANGIITQYCLYMDGRLIYSGSEENYIVTDLAVFTPHQFLLSACTHVGCTNSSWVLLYT

>08.2142-2236 95aa
PPEHVDSPVLTVLDSRTIHIQWKQPRKISGILERYVLYMSNHTHDFTIWSVIYNSTELFQDHMLQYVLPGNKYLIKLGACTGGGCTVSEASEALT

>09*2241-2325 85aa A2249D
PEGVPAPK A HSYSPDSFNVSWTEPEYPNGVITSYGLYLDGILIHNSSELSYRAYGFAPWSLHSFRVQACTAKGCALGPLVENRTL

>10*2328-2432 105aa R2354H
PPEGTVNVFVKTQGSRKAHVRWEAPF R PNGLLTHSVLFTGIFYVDPVGNNYTLLNVTKVMYSGEETNLWVLIDGLVPFTNYTVQVNISNSQGSLITDPITIAMPP

>11:2435-2528 94aa
PDGVLPPRLSSATPTSLQVVWSTPARNNAPGSPRYQLQMRSGDSTHGFLELFSNPSASLSYEVSDLQPYTEYMFRLVASNGFGSAHSSWIPFMT

>12:2533-2619 87aa
PGPVVPPILLDVKSRMMLVTWQHPRKSNGVITHYNIYLHGRLYLRTPGNVTNCTVMHLHPYTAYKFQVEACTSKGCSLSPESQTVWT

>13:2621-2718 98aa
PGAPEGIPSPELFSDTPTSVIISWQPPTHPNGLVENFTIERRVKGKEEVTTLVTLPRSHSMRFIDKTSALSPWTKYEYRVLMSTLHGGTNSSAWVEVT

>14*2724-2812 89aa A2795S
PAGVQPPVVTVLEPDAVQVTWKPPLIQNGDILSYEIHMPDPHITLTNVTSAVLSQKVTHLIPFTNYSVTIV A CSGGNGYLGGCTESLPT

>15.2821-2920 100aa 
PQNVGPLSVIPLSESYVVISWQPPSKPNGPNLRYELLRRKIQQPLASNPPEDLNRWHNIYSGTQWLYEDKGLSRFTTYEYMLFVHNSVGFTPSREVTVTT

>16.2925-3015 91aa
PERGANLTASVLNHTAIDVRWAKPTVQDLQGEVEYYTLFWSSATSNDSLKILPDVNSHVIGHLKPNTEYWIFISVFNGVHSINSAGLHATT

>17:3020-3105 86aa
PQGMLPPEVVIINSTAVRVIWTSPSNPNGVVTEYSIYVNNKLYKTGMNVPGSFILRDLSPFTIYDIQVEVCTIYACVKSNGTQITT

>18*3110-3200 91aa R3124G uncertain pathogenicity
PSDIPTPTIRGITS R SLQIDWVSPRKPNGIILGYDLLWKTWYPCAKTQKLVQDQSDELCKAVRCQKPESICGHICYSSEAKVCCNGVLYNP

>19:3404-3494 91aa
PASMEATEHCGRCDFNFTSHICTVIRGSHNSTGKASIEEMCSSAEETIHTGSVNTYSYTDVNLKPYMTYEYRISAWNSYGRGLSKAVRART

>20*3499-3585 87aa P3504T W3521R T3571M
PQGVS P PTWTKIDNLEDTIVLN W RKPIQSNGPIIYYILLRNGIERFRGTSLSFSDKEGIQPFQEYSYQLKAC T VAGCATSSKVVAAT

>21:3590-3676 87aa
PESILPPSITALSAVALHLSWSVPEKSNGVIKEYQIRQVGKGLIHTDTTDRRQHTVTGLQPYTNYSFTLTACTSAGCTSSEPFLGQT

>22*3677-3767 91aa S3743N uncertain pathogenicity
LQAAPEGVWVTPRHIIINSTTVELYWSLPEKPNGLVSQYQLSRNGNLLFLGGSEEQNFTDKNLEPN S RYTYKLEVKTGGGSSASDDYIVQT

>23:3768-3862 95aa
PMSTPEEIYPPYNITVIGPYSIFVAWIPPGILIPEIPVEYNVLLNDGSVTPLAFSVGHHQSTLLENLTPFTQYEIRIQACQNGSCGVSSRMFVKT

>24*3863-3960 98aa G3895E
PEAAPMDLNSPVLKALGSACIEIKWMPPEKPN G IIINYFIYRRPAGIEEESVLFVWSEGALEFMDEGDTLRPFTLYEYRVRACNSKGSVESLWSLTQT

>25*3961-4062 102aa T3976M S4054I
LEAPPQDFPAPWAQA T SAHSVLLNWTKPESPNGIISHYRVVYQERPDDPTFNSPTVHAFTVKGTSHQAHLYGLEPFTTYRIGVVAANHAGEIL S PWTLIQTL

>26*4066-4150 85aa R4115C uncertain pathogenicity
PSGLRNFIVEQKENGRALLLQWSEPMRTNGVIKTYNIFSDGFLEYSGLN R QFLFRRLDPFTLYTLTLEACTRAGCAHSAPQPLWT

>27*4154-4258 105aa P4232R
PPDSQLAPTVHSVKSTSVELSWSEPVNPNGKIIRYEVIRRCFEGKAWGNQTIQADEKIVFTEYNTERNTFMYNDTGLQ P WTQCEYKIYTWNSAGHTCSSWNVVRT

>28*4265-4351 87aa T4337M
GLSPPVISYVSMNPQKLLISWIPPEQSNGIIQSYRLQRNEMLYPFSFDPVTFNYTDEELLPFSTYSYALQAC T SGGCSTSKPTSITT

>29*4356-4439 84aa T4425M T4439I
PSEVSPPDLWAVSATQMNVCWSPPTVQNGKITKYLVRYDNKESLAGQGLCLLVSHLQPYSQYNFSLVAC T NGGCTASVSKSAW T

>30*4444-4528 85aa Y4487C
PENMDSPTLQVTGSESIEITWKPPRNPNGQIRSYELRRDGTIV Y TGLETRYRDFTLTPGVEYSYTVTASNSQGGILSPLVKDRTS

>31*4529-4627 99aa Q4592H
PSAPSGMEPPKLQARGPQEILVNWDPPVRTNGDIINYTLFIRELFERETKIIHINTTHNSFGM Q SYIVNQLKPFHRYEIRIQACTTLGCASSDWTFIQT

>32:4633-4730 98aa
LMQPPPHLEVQMAPGGFQPTVSLLWTGPLQPNGKVLYYELYRRQIATQPRKSNPVLIYNGSSTSFIDSELLPFTEYEYQVWAVNSAGKAPSSWTWCRT

>33*4732-4825 94aa L4795R P4818L
PAPPEGLRAPTFHVISSTQAVVNISAPGKPNGIVSLYRLFSSSAHGAETVLSEGMATQQTLHG L QAFTNYSIGVEACTCFNCCSKG P TAELRTH

>34:4826-4927 102aa
PAPPSGLSSPQIGTLASRTASFRWSPPMFPNGVIHSYELQFHVACPPDSALPCTPSQIETKYTGLGQKASLGGLQPYTTYKLRVVAHNEVGSTASEWISFTT 

>35:4928-5014 87aa
QKELPQYRAPFSVDSNLSVVCVNWSDTFLLNGQLKEYVLTDGGRRVYSGLDTTLYIPRTADKTFFFQVICTTDEGSVKTPLIQYDTS

Comparative genomics of non-vertebrate USH2A

The interest here is primarily in the evolutionary origins of vision and hearing, though the function and interactions of usherin in human may be both retained and easier to study in orthologs in early diverging species. Surprisingly both number and order of fibronectin domains was largely fixed early on in eumetazoan history. This implies today's targets of fibronectin binding domains are not new but have been co-evolving with usherin since the pre-Cambrian.

Below, consecutive fibronectin domains are provided as separate fasta sequences for echinoderm and cnidarian. Each of these -- including that of S3743N -- is reciprocal best-blastp to the identically numbered fibronectin domain in human (data shown only for FN3 20-25 comparison of hydra/human). While it might seem that a deletion of one domain would only weaken overall binding by an inconsequential 3%, no such event seems ever to have been fixed. It's also a bit mysterious how disease could arise from a mere substitution in one of 35 fibronectin domains. The fibronectin domains of usherin must be a precision fit to their individualized binding targets.

>01:SP:908-993 Strongylocentrotus purpuratus (sea urchin) XM_001179293 
PPPVWQAVNPFSILLTWGPPDEPNGNILTYLLYRNNTLVYNGTAANPLGIQQFEDQNLSPFTTYSYYVQSANEAGRATSPVVIAQT
>02:SP:994-1091
IPAGFDVLTITNVQARSADFSWSQPADISATVTQYILTSMTPSKPDPPTQHFNGLATSYQATDLIPFTNYTFYLTVCTPGSC
>03:SP:1092-1207
PPQGVMDPNATALSQSSVFVSWDHPTEANGIITHYELFFRGYPGPDGTIDPPETRIFYPAGWFNPRPVLTPLEDPAEPPVTNFTHTGLDAFTRYQYRVTARNLAGTGHSSWTTVRT
>04:SP:1218-1311
PSVLGVSSSELNITWPQPQDNEIRGVVISYRLYRYITSDDPFAPPQTQELVYSGDGSGMFFVLGSLEPYSIHTFSIEACNSIGCILSPRSSGRT
>05:SP:1732-1808
VNLTHDMTGILNVYLDGCPPITSSSIQCSDPQSVSVYDGSTMNFTDNGLHVFTEYLYQVTATNIAGSTDGAWAAGRT
>06:SP:1813-1909
PIGLSPPIDPVSITGYVIQLQWQRPSGNTGLLTQYILSAYNLDLPDIAPVQAVFTDTTFSENIGNITDVIPYTNYDVSVTACTAGGCSESSAVSVRT 
>07:SP:1911-1997
EEAPSGVQSPEALSKTARTITAGWALPDRPNGIITSYQLQLNSVTVYTGTARNYTLSGLSVFNPYRLVLIACTQIGCSSSAEVTITT
>08:SP:2002-2093
PSSVDPPTLFPRSPRSIEARWTAPSQPNGILQRYVLYSSTVNGVVGDAVYNTSDLFTDFIINDLSPGTVYYLSVAACTGGGCTISAQSSVRT
>09:SP:2098-2182
PEGVPAPIITAASPLELVVTWTHPTQPNGVIISYALVQNGIVIQNSTSMSYTASNLSPWSLHVFRVEACTSKGCTFGPEGSGRTL
>010:SP:2185-2288
PPQGSIDLAVFTMGSRSVKATWTSPAQPNGMLVYEVLFTGIFYVAPEASNYALVTETRALYNGTIANEIVDITGLIPYSAYTIQVRAYNTEGSILSNQRTVTMP
>011:SP:2293-2385
DGVLPPTLISTGPTSIQATWTEPARNNAPGDPSFQLRYRPANQPGNQIDVFSNSVRVTSYTLTGLQAFNEYEFRVIAINNAGQTLSQWAAVFT
>012:SP:2390-2476
PGPIDPPMATDVRPYSTIVTWEFPDTPNGIITSIKLYQNNVLKTTLAGNATQLLIDDLTPFSDYMFSVEMCNTVGCTRSPDSITYTT
>013:SP:2480-2573
APSGQSPPVLSSPTPTSVQLSWSPPSMPNGVLTGYEIERRHQGFTAIFSVVSVAAGAPRAYLDLSAAITPYTAYEYRVKVTNAAGSSTSQWASV
>014:SP:2580-2661
PGGVREPSVTVLGPDSVMVSWEEPAMSNGEILSYTIRMPDPRIYLDQTNMTSYIVYNLVPYTDCSVTIEACTAGGCTQSNPT
>015:SP:2681-2769
ITQTYISVIWSPPVRPNGPNIRYELHRQKLREPLSTGPVQGLNFWQFIYNGEDTTFQDFGLSTFTTYIYRITVYNDIGSATSDPSDEVT
>016:SP:2775-2858
PTVAGTISAVAMDHISVLLNWTTPSLLQLQGDVVNYFILVSSPSRQYELTYDPGVSSDLLTSLMPNTEYEFRQVIYNGAFNITS
>017:SP:2869-2952
 GFAAPLISVLSPSAVNVMWLPPTQPNGDITQYVIYLDGERHGSVESNQLLYIMAGLQPYTIYSIQVEVCTEYDCLLSNATVVTT
>018:SP:2957-3046
PSGVTAPNLLVLGPRALEVSWASPASPNGIILGYEIQRREYQPCSDRPSTPSDGSESSCGYVECLRSESVCGNQCYSGLQACCDGILHEP
>019:SP:3254-3348
PISQAVSSYCDRCDFDANIDTCYSVDASYITSSPSGPGTPSGSNGLCPSALISIYTSGPNIYSYLDPGLSPYTRYEYLISAINAAGSSNSGLSNA
>020:SP:3355-3441
PVQVQAPEWSVEMGVLDTIQLAWDPPQQPNGIIVTYILRRDGIEIYRGADTSHSDNTGIQPFQHYSYTLSACTRVGCAASQPVVAAT
>021:SP:3446-3528
PENLFDPTLAALTSESILITWQEPGLPNGVIQEYSILQTGVAEPIYRGGPEGFQFTHTDLEPFTVYEYRLQACTSAGCSQSNP
>022:SP:3534-3623 threonine (red) corresponds to human S3743
MQAAPEGLMQPTHVVVSSTILELYWFEPSQPNGVITSYILYRDDELVYSGNNSVLTYVDTGLTPNTRYEYLVSAMTVAGGSNSSVHVAQT
>023:SP:3624-3716
PFITPEGIPPPTLRVLSASSIEATWTQPTVPNGIIRQYGIVILSGTQDERRLVATSVTMLVVDNLTPYTRYDMRVQACGDSGCGVGPRAYART
>024:SP:3718-3813
EAAPEEQGPPTLVSTGPSVVEVSWDPPARPNGIIQQYYVYRRQYRTTQELLVFLTTTEQSFVNAGGGLTAFNLYEYRVRVENSQGSTDSPWASIRT
>025:SP:3815-3915
ESVPVGLAAPNLVAVSPYAVQGTWTPPSSPNGIIAYYRIEYQERPNDPTATPAIVTAATVEGTVLQTTFYGLLPFTSYQVRIIAINGAGETSGPWGSVTTL
>026:SP:3919-4006
PGDIGQLTVEQQSNGLALLLRWDEPGQPNGVITNYFIYEDEYLIAPIYTGLTREFLFRRLTPYTEYVVVLEACTNVGCGRGNPQTVRT
>027:SP:4011-4120
PANQAPPTLGFVNSTAVVLNWKAPVNPNGAITQYDVIRRSARPTSPDRRKRNTDESAFTVTVVIHSEYNTDAEDYSFIDNSLQPYRIYEFRIQSINSQGSVDSDWVQVDT
>028:SP:4127-4213
GVAPPTVNHITNTPGSLVIGWTPPTESNGIITGYRLQRNSSTPFSFEADYGFQFTDMNLQAFTVYVYTITVCTQGGCTMSSATSIRT
>029:SP:4218-4294
PTFVDPPSPVPISSSQIMVNWTTPSIASGEIMQYRLKVDDEVRYSGLGLSTTISGLIPHQEYTFVLEACTSGGCTDS
>030:SP:4306-4395
PTGMNPPSLRVLSATALEASWSEPNEPNGIISRYELRRDGKLVYDGDAQRYQDFGDGGLGLTPGQQYSYVITAFNSRGHAISDAAVITTS
>031:SP:4397-4476
SSPAGLSPPIVSPLSSQIIGATWQPPAFPNGEIQNYTLYVDNSIVYSGRLFSFDIRNLDFYALYEIRVGACTTSGCALSD
>032:SP:4490-4587
QVAPSLEPLADNLGIASGIRAVWNAPSNDNGIILGYELYRRSVDRTTGGRSDPTLLVNTTLREYVDMDSALIPDETYEYLVVSFNSVGEASSPWASVR
>033:SP:4591-4666
APPQGLLPPVVSELEATSLTLTIQEPSQPNGAIQRYVIYRNSTVLSQTTLTSYRDSDLLPFTTYSYSVEACTSGGC
>034:SP:4681-4782
PSGLSAPVVMQADSTWIQLTWDYPTNTNGIITKFELLFGPGCPLTTQPFEQTCPVEDYIPINKDLALTHNVTGLKPYTRYDFVVAAYNDAGQVESSPVQENT
>035:SP:4800-4869
TNNSLITVSWAQSFVLNSMLREYILVENGDTVYSGIATAVTRPLKDMEYEFTVTCVTSTGSISYPTIIY

USH2A_hydMag overall domain structure of usherin in Hydra magnipapillata (cnidaria) XM_002165140 SMART domains):

EGF_like     1    60  2.5e-01
EGF_Lam     63   126  5.1e-08
EGF_Lam    129   186  2.2e-03
EGF_Lam    189   239  3.1e-12
EGF_Lam    242   285  2.0e-11
EGF_Lam    288   337  1.7e-09
EGF_Lam    340   390  3.3e-07
EGF_Lam    393   446  1.6e-07
EGF_like   449   485  9.8e-01
EGF_Lam    486   534  3.2e-05
FN3        538   612  2.9e-03
FN3        626   705  4.7e-08
FN3        719   804  9.4e-03
FN3        821   902  4.8e+00
LamG       990  1131  4.3e-23
LamG      1178  1311  4.0e-19
FN3       1394  1476  5.5e-04
FN3       1487  1634  2.1e-07
FN3       1645  1724  8.3e-04
FN3       1738  1826  6.9e-01
FN3       1840  1925  2.1e-02
FN3       1938  1995  4.7e+01
FN3       2006  2092  6.1e-06
FN3       2106  2180  6.0e-05
FN3       2191  2283  1.4e-01
FN3       2368  2444  6.2e-03
FN3       2523  2589  1.5e+01
FN3       2907  2987  7.9e+00
FN3       3002  3084  6.2e-08
FN3       3098  3172  1.0e-01
FN3       3186  3265  3.1e-04
FN3       3276  3357  3.8e-07
FN3       3371  3455  1.9e-07
FN3       3469  3544  5.5e-04
FN3       3558  3652  1.9e+00
FN3       3664  3739  2.5e+00
FN3       3753  3829  2.3e+01
FN3       3843  3917  5.6e-01
FN3       3928  4006  1.7e+00
FN3       4026  4108  2.1e+01
FN3       4119  4209  1.7e-05
transmem  4295  4317  -

Domains 20-25 of Hydra aligned to orthologous domains in human

20
hydMag  2907 PENLEAPVPTVLSS--DKIFVIWSLPNKPNGQISEYILYRLKWSTKEERLVYRGLQLSFLDMDGLEPYTGYLYILSACTVSCTNISASVLAYT 2997
             P+ +  P  T + +  D I + W  P + NG I  YIL R       ER  +RG  LSF D +G++P+  Y Y L ACTV+    S+ V+A T
homSap     1 PQGVSPPTWTKIDNLEDTIVLNWRKPIQSNGPIIYYILLR----NGIER--FRGTSLSFSDKEGIQPFQEYSYQLKACTVAGCATSSKVVAAT 87

21
hydMag  3002 PVGVQAPLLQAISSNSIQVNWSTPTYPNGKIIFYNVTILVKDIYRSLIPNDSIGEMMSLTVSGLLPYNSYTFKVYACTSIGCSSS 3086
             P  +  P + A+S+ ++ ++WS P   NG I  Y +    + + + LI  D+       TV+GL PY +Y+F + ACTS GC+SS
homSap     1 PESILPPSITALSAVALHLSWSVPEKSNGVIKEYQI----RQVGKGLIHTDTTDRRQH-TVTGLQPYTNYSFTLTACTSAGCTSS 80

22
hydMag  3094 LQAAPESI-LPPQITIINARMVELEWSPPKSPNGIILSYVLKRNLTVIYNG--MQLNLSDVSVLPATLYIYTVGASTLGGYTESNITVVRT 3181
             LQAAPE + + P+  IIN+  VEL WS P+ PNG++  Y L RN  +++ G   + N +D ++ P + Y Y +   T GG + S+  +V+T
homSap     1 LQAAPEGVWVTPRHIIINSTTVELYWSLPEKPNGLVSQYQLSRNGNLLFLGGSEEQNFTDKNLEPNSRYTYKLEVKTGGGSSASDDYIVQT 91

23 
hydMag  3182 PESTPQGIALP-NLYALSSSSINVSWSMPA-LSNGILISYTVYYQESDSLILKKPASLNFSVLITQLKTFTLYFVRIEACTIVGCGSSDKASVRT 3274
             P STP+ I  P N+  +   SI V+W  P  L   I + Y V   +     L      + S L+  L  FT Y +RI+AC    CG S +  V+T
homSap     1 PMSTPEEIYPPYNITVIGPYSIFVAWIPPGILIPEIPVEYNVLLNDGSVTPLAFSVGHHQSTLLENLTPFTQYEIRIQACQNGSCGVSSRMFVKT 95

24 
hydMag  3276 EEPPVGQQSPTLFARGTTIVEISWVQPLFPNGIILNYQVERRSSMVSY--IIYVGLR--TDYIDT--QLTAYTEYEYRVRSVNSKGESISEWSKVRT 3366
             E  P+   SP L A G+  +EI W+ P  PNGII+NY + RR + +    +++V      +++D    L  +T YEYRVR+ NSKG   S WS  +T
homSap     2 EAAPMDLNSPVLKALGSACIEIKWMPPEKPNGIIINYFIYRRPAGIEEESVLFVWSEGALEFMDEGDTLRPFTLYEYRVRACNSKGSVESLWSLTQT 98

25
hydMag  3368 EGVPENVQPPFIEVLNSSSIFAKWDPPTTINGILIEYSL--QIR----LFSQPQLIEDIRCCIKPSVTKVQVDGLLPFTSYEFRVVASTFIGSSYSPWVLVR 3463
             E  P++   P+ +  ++ S+   W  P + NGI+  Y +  Q R     F+ P +       +K +  +  + GL PFT+Y   VVA+   G   SPW L++ 
homSap     2 EAPPQDFPAPWAQATSAHSVLLNWTKPESPNGIISHYRVVYQERPDDPTFNSPTVHA---FTVKGTSHQAHLYGLEPFTTYRIGVVAANHAGEILSPWTLIQ 100

Known splice variations

With over 15,000 bp needed for an mRNA encoding full length gene yet typical cDNA reads of 600 bp, more than 25 reads are required simply to tile the gene once. However reads tend to pile up at termini so realistically several hundred transcripts would be needed in each of several species to establish splice variants with any phylogenetic depth.

A great many human alternative splices are evidently transcript noise resulting in non-functional protein (as is clear from 7-transmembrane GPCR examples which are necessarily dysfunctional). It's difficult to understand how USH2A could be so exquisitely sensitive to (sometimes quite mild) point mutations along its entire length yet still function after large deletions wipe out whole domains, sometimes fractionally.

The functional significance of supported variants would require testing in a mouse model of USH2A disease which exist for both USH2A and USH2C (gene name GPR98) but is difficult for any long gene. Thus alternative splicing is not a favorable object for study. Nonetheless, the 5 reported splice variants in human USH2A are worth further consideration, in particular an alternate splice with donor preceding exon 59 and acceptor within exon 64 would delete six Fn3 repeats (residues 3580–4121) in mouse numbering; a second variation involves expression of exon 71 in inner ear but not retina.

USH2A alt.jpg

The current status of variant transcripts about S3743 in exon 56 can be studied by Blat in human and mouse with expression tracks open. No variant splices affecting this region have ever been reported to GenBank in either mouse or human as of July 2009.

>USH2A_hs_55-57 
GLQPYTNYSFTLTACTSAGCTSSEPFLGQTLQAAPE
GVWVTPRHIIINSTTVELYWSLPEKPNGLVSQYQLSRNGNLLFLGGSEEQNFTDKNLEPNSR
YTYKLEVKTGGGSSASDDYIVQTPMSTPEEIYPPYNITVIGPYSIFVAWIPP

>USH2A_mm9_55-57
GLQPYTNYSFTLAVCTSVGCTSSEPCVGQTLQAAPQ
GVWVTPRHIIINSTTVELYWNPPERPNGLISQYQLRRNGSLLLVGGRDNQSFTDSNLEPGSR
YIYKLEARTGGGSSWSEDYLVQMPLWTPEDIHPPCNVTVLGSDSIFVAWPTP


Structural significance

The 3D structure of the 22nd FN3 domain of USH2A could be evaluated using best-blastp to a structurally determined FN3 domain in PDB, then modelling the FN3 domain in question by submitting it to SwissModel with both S3743 and T3743. Here the percent identity to an already-determined structure is mediocre but perhaps still sufficient.

If the serine at 3743 is on the surface and involved in a binding interaction with a second (unknown) protein, then the effect of the 3743N substitution would be very difficult to evaluate because asparagine and serine are of similar bulk and polarity and the binding structures would be unlikely to have a PDB representative.

While S <--> N is a benign substitution at many positions in many proteins, at residue 3743 it appears that the hydroxyl lacking in asparagine is critical because, to the extent that any subsition at all is tolerated, here it is threonine. Bulk too may play a role because tyrosine is never seen.

The two best blastp hits at PDB of the S3743 region of USH2A are shown below -- quite weak but to FN3 domains of other proteins. The first match has threonine at position homologous to 3743, the second serine, so both models may have utility. In both cases the critical residue is in the final beta strand of an anti-parallel sandwich of sheets. Tandem FN3 pairs have been structurally determined for more dimly related proteins.

Note how the alignments below strengthens the case for the significance of the residue at 3743. It follows a conserved proline motif, part of a turn and not in a beta strand. The serine (resp threonine) then begins the last beta strand of the top anti-parallel sheet. The sidechain itself is not directly involved in the beta sheet hydrogen bonding scheme which uses the carbonyl and amide hydrogens. This turn and strand start are immensely conserved in FN3 domains -- many hundreds of billions of years -- even as many other regions have diverged beyond recognizability.

USHA2A 3D.jpg
 pdb|1X5L|A second Fn3 domain of ephrin receptor s EPHA8 Identities 29% Positives 47% 

USH2A GVWVTPRHIIINSTTVELYWSLPEKPNGLVSQYQLSRNGNLLFLGGSEEQNF----------TDKNLEPNSRYTYKLEVKTGGG  
       V V  R      T+V L W  PE+PNG++ +Y++       +    E Q++          T   L+P +RY +++  +T  G
1X5L  QV-VVIRQERAGQTSVSLLWQEPEQPNGIILEYEIK-----YYEKDKEMQSYSTLKAVTTRATVSGLKPGTRYVFQVRARTSAG
                                                                           beta strand 6 
 pdb|1WFO|A The Eighth Fn3 Domain Of Human Sidekick-2 SDK2 Identities 27% Positives 47%
                                                                     ...L.Pft.Y...v.act..G consensus of 35 USH2A FN3 repeats
USH2A             INSTTVELYWSLPEKPNGLVSQYQLSRN--------GNLLFLGGSEEQNFTDKNLEPNSRYTYKLEVKTGGG-SSASDDYIVQT
                  + +T+V L W  P  PNG++  YQ++            +  L  S  Q +T   L+P S Y +++  +T  G   A++  +V T
1WFO              VRTTSVRLIWQPPAAPNGIILAYQITHRLNTTTANTATVEVLAPSARQ-YTATGLKPESVYLFRITAQTRKGWGEAAEALVVTT
                                                                           beta strand 8


Normal function of USH2A

A dozen recent papers on hearing and vision have illuminated the normal function USH2A gene product via its binding with other domains and proteins and related diseases, important progress but still not sufficient to explain how and whether specific fibronectin variants lead to disease phenotypes.

Protein binding partners of USH2A (called usherin as protein) are important to identify because surface contacts in hetero-oligomers co-evolve and so have explanatory potential for non-synonymous variation in USH2A. Prime suspects include gene products themselves causing vision and hearing disorders. The gene LCA5, responsible for Leber congenital amaurosis V, encodes lebercilin, a 697 residue protein involved in centrosomal and cilia. Lebercilin and USH2A do not interact directly but rather via NLP encoded ninein-like protein at the basal bodies of the photoreceptor connecting cilia.

Both lebercilin and ninein-like have extended coiled-coil domains in addition to SMC (an ancient centrosomal chromosome segregation domain) and EF-hands (in NLP). NLP has not surfaced to date in vision disorders, perhaps because of an essential role in mitosis, a notation somewhat at variance with its moderate rate of divergence. Alternative alleles of NLP -- and indeed LCA5 through chain transitivity of binding effects -- could compensate or exacerbate pathogenic alleles of USH2A.

Coiled-coil helices could plausibly bind to an established groove on USH2A fibronectin domains though this has not been established as the NLP interaction mode. USH2A also has laminin, EGF and PDZ1 domains which commonly interact with other proteins.

Laminin domains of USH2A, residues 518-1052 in short form numbering, bind the 7S domain of type IV collagen, with USH2B mutations in loop b (but not loop d) abolishing this binding. These experiments need to be revisited using full length USH2A.

USHaudit.jpg

The PDZ1-binding domain at the USH2A C-terminus (cytoplasmic side) bind both whirlin (DFNB31) and the scaffold protein harmonin (causative for USH1C) at synaptic terminals of both retinal photoreceptors and inner ear hair cells (and transiently during stereocilia development). Cell adhesion molecules in pre- and post-synaptic membranes keep synaptic clefts in proper register.

Harmonin plays an master scaffolding role across numerous proteins involved in type 1 Usher diseases, so this linkage to USH2A (and GPR98 of USH2C) begins to explain how mutations in so many genes can give rise to similar disorders. These sometimes involve compound cross-gene disorders: the known digenic pairs include PCDH15/CDH23 (called USH1H), PRPH2/ROM1 (also called called USH1H) and MYO7A/CLRN1 (disease USH3A). The tetraspanins PRPH2 and ROM1 are full length paralogs of retinal outer segment disks not related to the multipass membrane protein CLRN1.


Ushcil.jpg

USH2A has a very specific localization within retinal rod cells (and is not expressed at ribbon synapses) a location that clarifies the role of its fibronectin domains to a certain extent. USH2A localizes to the plasma membrane of inner segments (but not that of the non-motile cilia), internally with its PDZ-motif presumably anchored to the various PDZ proteins mentioned above and externally with laminin and fibronectin domains surrounding the cilia that connects to the rhodopsin-containing outer segment, perhaps spanning the 1000 angstrom wide band of periciliary matrix material between the plasma membranes of the inner segment and cilium. The physical proximity to a retinitis pigmentosa GTPase regulator gene RPGR (no associated deafness so not in Usher complex) is perhaps suggestive of an interaction but RPGR does not contain any domains that would bind to fibronectin.

The non-progressive deafness in USH2A can be explained by its transient expression in kinocilia. These, not the so-called sterocilia (actin filament-based membrane protrusions), are the structure related to the photoreceptor connecting cilia. Kinocilia resemble the membranous protrusions of the periciliary ridge complex of photoreceptors and could conceivably be homologous, recalling the many deep connections between the evolutionary origin of these sensory systems dating to cnidarian rhopalia.

The DFNB31 gene product whirlin, USH2A and GPR98 co-localize at the cilium and the outer membrane of photoreceptor cells as well as in spiral ganglion neurons of inner ear but this is related inside the cell to PDZ domain binding so does not illuminate extracellular fibronectin domains. However GPR98 (called VLGR1 as expressed protein and USH2C as disease) is also transmembrane and its ectodomain comprises some of the fibrous links connecting these adjacent membranes.

Among the 143 disease genes at Retnet, USH2A has partial homology matches only to GPR98 (at amino terminal LamGL domain) and to JAG1 and CRB1 (to 9 tandem EGF_Lam domains). To the extent that like domains dimerize, LamGL is the most promising outside the cell as it comes at the most extended position of the USH2A ectodomain. Such binding would be reminiscent of PCDH15 (USH1F) and CDH23 (USH1D) forming parallel homodimers along their length but binding paralogously through their terminal cadherin domains in sterocilia hair cell links. (Despite an immense number of cadherins in the overall proteome, PCDH15, CDH23, and CDH3 are the only ones that have surfaced in vision and hearing disorders.)

Weak alignment between LamGL domains of USH2A and GPR98 could reflect conserved 3D structures.
Residues A125T, C163Y, V218E, V230M, G268R (blue polymorphisms, red known disease) are known variations; if binding to GPR98
occurs here, these could surface in digenic disease.
 
0121 HSNSASFIFGNHKSCFSS------PPSPKLMASFTLAVWLKPEQQGVMCVIEK-TVDGQI 0173 USH2A
     || + +  |   +  | +      |     +|+|| + |+ |       +| |   +| |
1322 HSGTDALYFTGLEGAFGTVNPKYHPSRNNTIANFTFSAWVMPNANTNGFIIAKDDGNGSI 1381 GPR98

0174 VFKLTISEKET----MFYYRTVNGLQPPIKVMTLGRILVKK-WIHLSVQVHQTKISFFIN 0228 USH2A
      + + |   |+      +|+|+      |   |+ + | +  |+|| + +    | |+++
1382 YYGVKIQTNESHVTLSLHYKTLGSNATYIAKTTVMKYLEESVWLHLLIILEDGIIEFYLD 1441 GPR98

0229 GVEKDHTPFNARTLSGSITDFASGTVQIGQSLNGLEQFVGRMQDFRLYQVALTNREILEV 0288 USH2A
     |   +  |   ++| |       | ++||  +|| ++| | ||| | |+  ||  || |+
1442 G---NAMPRGIKSLKGEAITDGPGILRIGAGINGNDRFTGLMQDVRSYERKLTLEEIYEL 1498 GPR98

Vezatin, a ubiquitous adherens cell-cell junction protein whose primary interactions are with myosin VIIa via a FERM domain and the cadherin-catenins complex, also has a 2 transmembrane component. MYO7A (but not VEZT) has been implicated in Usher disorders. It has no recognized domains but a coiled-coil region 430..462 and transmembrane regions 139..159 and 162..182 have been predicted by ModBase and SwissProt so it is doubtful as a extracellular USH2A binding partner.

Keeping in mind that these proteins can expressed at many other sites in the organism, the overall retinal cytoplasmic Usher protein complex can be understood in the context of a reloading zone. Recall photoreceptor membranes in the outer segment turn over rapidly via RPE phagocytosis so require constant renewal of their components from the inner segment. This material is vectorially transported within the inner segment and then through the connecting cilium to the outer segment, requiring reloading in a specialized compartment of the apical inner segment.

USHtransport.jpg

The Usher protein complex in rod photoreceptor cells is usefully envisioned at left by T Maerker et al in Hum Mol Gen 2008 17(1):71. Protein interactions and vessicle transport in the ciliary/periciliary region. Ectodomains of USH2A and GPR98 connect membranes of connecting cilium to inner segment. Whirlin PDZ domain anchors both in the cytoplasms. Dynein mediates vesicle transport along microtubules to the apical inner segment collar. Docking and fusion membrane sites are determined by the fixed Usher protein complex. Note USH2A does not actually reside on the cilium side and binding to GPR98 and other USH2A chains (by unspecified domains) are speculative.

Note that fibronectin domains cannot merely serve as spacers with laminin domains doing the anchoring because the fibronectin domains are far too conserved. They cannot bind each other in anti-parallel fashion because all project out from same the inner segment side. GPR98 is a very peculiar orphan GPCR with single pentraxin, LamGL and EPTP domains and many Calx-beta domains in its long N-terminal ectodomain (that don't seem to bind calcium); some of these resemble integrin-beta4 domains.

Of the known proteins in the Usher network, GPR98 is the most promising in terms of perhaps having domains that bind and thus co-evolve with USH2A. But without a 3D structure of FN3 binding to some GPR98 domain, the importance of surface residues in fibronectins cannot be evaluated. Digenic disease, not yet known for this pair of genes, would imply domain binding but more likely involving LamGL paralogous domains rather than FN3. Fibronectin itself has been proposed as binding partner (to EGL-Lam region) but in a study using the incomplete gene. Integrin and heparin bind to a FN3 domain of fibronectin with some molecular modeling available but this seems so sequence-specific that its relevence to S3743N of USH2A is problematic.

However the list of proteins in the Usher network has grown steadily in the last 2-3 years and perhaps other disease genes will add to the 38 relevent ones at RetNet. Indeed a recent photocilium proteomics report finds rather too many new genes, 1968 proteins in mouse rod outer segment, its cytoskeleton, axoneme, basal body, and ciliary rootlet. These include 13 genes known to cause cilia disease and 7 intraflagellar transport proteins, though the bulk of the protein list comes from the outer segment (based on rootletin knockouts). Protein copy number gives some idea of relative stoichiometries based on 70 million rhodoposin molecules per mouse rod cell and GNAT1 transducin at a tenth that level. By comparison, USH2A is present at 3,000 copies, CROCC (rootilin) at 220,000 copies, MYO7A at 1,000 copies and RP1 at 14,000 copies but oddly GPR98, DFNB1 and VEZT were not detected.

NLPush2a.jpg

Usherin also interacts with NLP-encoded ninein-like protein, a centrosome component that also binds lebercilin (the LCA5 gene product affected in Leber congenital amaurosis). Once again the binding is to the cytoplasmic domain of usherin rather than the fibroonectin ectodomain, localizing to basal bodies of photoreceptor-connecting cilia. NLP has 3 EF hand and various coiled coil domains; one EF hand is still calcium-competent but this is not relevent to the usherin interaction.

Oddly neither NLP on chr 20 nor its paralog NIN on chr 14 have been implicated to date in either auditory or hearing disease (eg Usher or LCA), perhaps because both are too important to basic processes such as cell division. NIN perhaps offers some clues as it anchors microtubules minus-ends to the centrosome via its central coiled-coil domain, a structure also found in NLP and LCA5 and its paralog LCA5L. These proteins all contain extensive regions of intrinsic disorder so are probably extended rather than adopting conventional globular folds.
The interaction docks the near-terminal residues 5124–5196 of USH2A to 656–925 on NLP (its weakly-predicted intermediate filament domain). That is not a particularly well-conserved region of the USH2A protein outside of mammals, raising questions about the antiquity of this interaction. No USH2A disease allele lies within it -- the nearest is position 5031. It does not appear plausible that a cytoplasmic region of usherin, across the transmembrane segment and so remote from the ectodomain, could be impacted by fibronectin variations such as S3743N. There seems a disconnect between internal and external protein interactions of usherin -- which region is really responsible for its localization? The cytoplasmic function seems more important to formation of cellular substructures but perhaps these in their entirely depend on external usherin anchoring for correct localization and stability (at least in specialized photoreceptor and auditory cell types).

   homSap  LRIPSQNQTSLTYSQGSLHRSVSQLMDIQDKKVLMDNSLWEAIMGHNSGLYVDEEDLMNAIKDFSSVTKERTT USH2A 5124–5196 
   panTro  ......S......T............................V...S...............G..........
   gorGor  ......S.......................................................G.....E....
   ponAbe  ......S....................V........D.............C...........G..........
   rheMac  ......S..G.................V........D.........................G.......H..
   calJac  ......S.............................D.........................G.......H..
   tarSyr  ......S.V..N....................A.T.D....T....................G.......H..
   tupBel  ......S.I.Q..P..C.................I.D....T................D...G.......H..
   turTru  ......S.V.Q.....A........I........I.D....T....D...............G...M...H..
   equCab  .H....S.I.Q.......................I.D....T....D...............G.......H..
   felCat  ......S.I.Q................F......I.D....T....D...............G.......H..
   canFam  ......S.I.Q........C.......L......V.D....T....D...............G.......H..
   bosTau  ......SHG.Q.......................I.D....T....D...............G.......H..
   dasNov  ......SHL.Q....E...H.I............I.D....T....D...............G.......H..
   monDom  ......S.M.R....P..................T.D....T....D......D........G.......H..
   choHof  ......S.I.Q........C.I............T.D....T....AN.........I....G.......H..
   ochPri  ......SHMNQ...T............F.....VI.D....T................K...G.......H..
   echTel  ......SHI.PP.............L.VR....PV.D....T....D...............G.......H..
   cavPor  ..V...S.L.QA.............L.G......L.D....T.V..S............V..G.......H..
   dipOrd  ......S.L.QS...S..................T.E....T....SR..............G..........
   pteVam  .H..R.S...Q.C.R.P..........L........ST...T....D...............G.......H..
   musMus  ......S.L.HA...S...........MA....VTED....T....S........E......G.......H.A
   ratNor  ......S.L.HA...............SP...A.TED....T....S...C....E......G.......H.A
   myoLuc  ......S.L.Q--A...........L.LH.....V.DA...T....D....L..........G.......H..
   vicPac  .......PV.QPC.G.A.P......L.VP...A.I.D....TVL..D...............G.......H..
   otoGar  ......-.I.RA.RE-P.....R.............D....T....S...............G.......H..
   ornAna  ......S.LGR..............I.M..Q.I.IGD.M..TVTA.DNR.CT.N...V....N.......H..
   anoCar  .QA...M.GT---..S.........I.VY...S.IEDAV.DT.I.QE.S....DA..V....G..T....H..
   galGal  G.R...S.LNR...PA.........L..Y...SFA.ELP.DT.I---NA..AE.D..V.V..G..T....H..
   taeGut  G.RQ..G.L.R...PA.........LNLY...TF--E.P.D..L---NA........V.V..G..T....H..
   xenTro  .QVRKASHV.HSF..N..Y..A...IHSHH..S.VES...DSAL..D..M.TED...IST..G..TM..QH.A
   tetNig  ..L.P.TDLTPA...HC......R.I.--G.SLM.EERT.DNPA..D.....E.DE.VD.L.TL.T.K..H.M
   gasAcu  ..V...TDL.QA...H.........I.--R.SLMLEEGS.DNPL..D...C.E..EFVD...SLRGGK..Q.M
   oryLat  ..V.N.TEFTQA...H.........I.--Q.SLMVEEGS.DNPL..D.....E...FVD...ALG.AK..H.M
   Consen  lrips.sq.s..ysq.s.h..!....#..d.k.l.ddsl.#ti.ghdsg.yv####l.#a..gfssvt..h.t

Functional significance

Here human individuals homozygous for N3743N could be examined for early loss of hearing accompanied by loss as teenager of mid-periferal and night vision. Heterozygous S3743N compounded by a different USH2A mutation (known autosomal recessive causative) on the other chromosome 1 allele would also imply pathogenicity of N3743N.

Alternatively, since mouse has an moderately conserved orthologous 22nd FN3 domain at 77.5% identity including the serine, the effect of S3743N could be considered as a knockin. Even if the mouse gene did serve as a valid disease model for other alleles, symptoms for S3743N might or might not develop within the 2-3 year lifespan of laboratory mouse.

USH2A22homMus.jpg

For the immediate term, comparative genomics is best available guide. Here it is clear that S3743 is immensely conserved over several billion years of evolutionary time in those clades observable via genome projects (transcripts are too rare in this long gene to sample species diversity further). This establishes that N3743 is not part of the acceptable reduced alphabet at this residue, though T3743 at one time appears to have been acceptable in the teleost ancestor (and indeed is retained to the present day in early diverging deuterostomes).

The difference alignment below of the exon containing S3743 shows overall conservation well above human proteome average but not extraordinary inflexibility at most positions. The fibronectin portion is evolving as well, no doubt through both drift, internal adaptive change, and co-evolutionary response to binding partner change.

Consequently S3743N -- despite its innocuous appearing nature (ie high Dayhoff matrix score) is likely to have significant non-adaptive impacts on either standalone structure of USH2A protein or its interaction with other proteins in the basement membrane. If selective pressure did not exist to maintain S3743, then what would account for its constancy despite copious observed variation in nearby residues over the same time span?

The large number of known loci throughout this protein that give rise to Usher Syndrome 2A suggest that not only does this protein play an exceedingly important structural role highly sensitive to seemingly minor mutation perturbations but also that no other gene product is able to compensate for its absence.

Of the 22 known disease-causing point mutations that within a FN3 domain, none is situated at a position homologous to S3743; the closest are P1212L at -3 of the 2nd FN3 domain, T3571M at +10 of the 20th FN3 domain, and T4425M at +10 of the 29nd FN3 domain.

In a sense, the real mystery with USH2A is how nearly all of its 33 FN3 domains could be so mission-critical that a slight perturbation in one cannot be compensated for by the strength of interaction in the remaining 32. Here the overall 3D structure is not known, yet surely it is very extended like all other studied fibronectin proteins, for example titin which extends for 20,000 angstroms, longer than a entire bacterial cell. Titin has 33,000 residues made up of 195 immunoglobulin Ig and 132 fibronectin FN3 domains. By proportionality then (as the domains types are of similar size), USH2A would extend outside the cell for nearly 3,000 angstroms.

The answer may be in the observation that Usher 2A is not noticed at birth unless hearing is tested and vision problems arise only in the second decade. With sensory systems we are perhaps more sensitive to functional loss than at other basement membrane sites of USH2A expression.

Speculatively, USH2A protein may not be replenished over the lifespan in the basement membranes of these terminally differentiated cell types and slight dysfunctionalities might lead to slight enhancements in turnover rate, over decades leading to excessive loss of cell matrix structure and perhaps death or inability of the hosting cell to carry out its other functions.


USH2A allele assessment by PolyPhen is suboptimal

Despite 60 years of work that began in 1949 with E6V in sickle cell hemoglobin), it remains very difficult to accurately interpret observed coding variation in human genes in terms of disease. This is illustrated in the modern era in the Watson human genome, which is homozygous for known disease alleles in genes for Usher 1A and Cockayne syndromes, yet the individual remains asymptomatic at age 80. Consequently, the goal S3743N must be lowered from a definitive diagnosis to merely making the best interpretation that consideration of all current data allows.

The much-cited SNP evaluation tool, PolyPhen, works at a disadvantage here because its algorithm uses only comparative genomics derived from blastp matches at SwissProt rather than from NCBI wgs and nr, here locating a meagre 6 homologs for USH2A vs 46 species actually available, resulting in a mediocre estimation of the reduced alphabet at 3743. Orthology is assumed, not established -- a problematic procedure since this could mix in lineage-specific FN3 domain gains and losses that later re-functionalized perhaps with divergence of constraints on the critical residue.

Sequences are also not considered in context of their phylogenetic tree position so cumulative supporting branch length time is an unavailable metric (eg PolyPhen treats sea urchin as equally informative as mouse). The fibronectin domain and its type (III) are not explicitly recognized by the algorithm and so its experimental domain literature -- and that of USH2A -- are ignored. Paralogous residues to S3743 in internal and external FN3 domains are thus not in the mix. Because of a 50% identity cutoff, PolyPhen misses here diverged but still highly informative PDB structures.

Scoring the list of known Usher 2A Syndrome causative alleles that lie in USH2A fibronectin domains with PolyPhen gauge its accuracy in them, typically 70% correctly identified but here only 12 of 22 are scored correctly (55%), treating 'possibly damaging' as a miscall for reasons given elsewhere.

None of the clinically assessed FN3 mutations lie in the 22nd FN3 domain and none lie in internal paralog position to S3743. However 3 do lie in the conserved patch about 3743 in paralogous domains. The tendency for residues important to normal function to occur in patches has not yet been systematically evaluated in PolyPhen but may be useful in weighting allele interpretation.

S  3743  T  0.364  benign             naturally occuring
Q  4592  H  1.308  benign   
A  2795  S  1.317  benign     
S  3743  N  1.348  benign             uncertain pathogenicity
T  3976  M  1.644  possibly damaging  
S  3743  R  1.729  possibly damaging  naturally occuring
P  3504  T  1.757  possibly damaging  
A  2249  D  1.828  possibly damaging  
T  4337  M  1.835  possibly damaging  
R  2354  H  1.909  possibly damaging  
T  4439  I  1.933  possibly damaging  
L  4795  R  2.018  probably damaging  
T  4425  M  2.050  probably damaging  
T  3571  M  2.050  probably damaging  
S  4054  I  2.064  probably damaging  
G  3895  E  2.274  probably damaging  
R  3124  G  2.429  probably damaging  uncertain pathogenicity
P  4232  R  2.621  probably damaging  
R  4115  C  2.654  probably damaging  uncertain pathogenicity
P  4818  L  2.724  probably damaging  
Y  4487  C  2.758  probably damaging  
P  1059  L  2.846  probably damaging  uncertain pathogenicity
P  1212  L  2.846  probably damaging  
W  3521  R  3.902  probably damaging  

PolyPhen thus treats S3743N as borderline benign based primarily on S->N innocuousness; its algorithm proceeds without a valid description of the reduced alphabet at this position (S,T: hydroxyl) nor knowledge of the subsequent fixation of S in amniote. It is far from ideal to use BloSum transition matrices which are broad averages over unrelated proteins and so greatly enriched for bland transitions such as S-->N which indeed are neutral most of the time (indeed by design of the genetic code).

In summary, algorithmic approaches such as PolyPhen to coding SNP classification rank high in convenience but fall short (see 1,2) by not using all available information, leading to suboptimal annotation. Such tools have value in quick screening of millions of alleles for flaming anomalies but are not particularly useful for specific genes because curational judgement on more extensive information will always provide more insight.

This is illustrated above in comparative genomics products not available to these tools but very useful in making the best possible annotation call. While this can never attain perfect reliability, it is still exceedingly important to make all-out use of bioinformatics in view of the high costs and time needed for experimental validation, in view of disease burden in an era of imminent genomic medicine.