遗传学报Acta G enetica Sinica , J a nu ary 2005 , 32 (1) : 1~10 ISS N 0379 - 4172
Ge n o m e S e q u e n c e C o m p a r a t i ve
a n d S h o rt Ar m of Hu m a n A n al ys i s of L o n g Ar m X C h r o m o s o m e
L ΒZhan2J u n ①, SON G Shu2Xia , ZH A I Y u , H OU J ie ,H AN Li2Zhi , WAN G Xiu2Fang
( Department of L a boratory Animal , Hebei Medical University , S h ijiazhuang 050017 , China)
A b s t r a c t : 30 % of the ge n e s t e st e d o n X p e scap e d inactivatio n ,where a s le s s t ha n 3 % of t he ge n e s o n Xq e scap e d in2
activatio n. To inve stigat e t he molecu lar mecha n i s m involve d in t he p rop a g atio n a n d maint e n a n ce of X chro m o so m e inac2 tivatio n a n d e scap e ,t he lo n g arm a n d sho rt arm o f t he X chro m o so m e were co m p a re d fo r RNA b inding de n s ity. Nu2 cleotide se q ue n ce s o n t he X chro m o so m e were d ivide d into 50 kb p e r se g me n t t hat wa s reco rde d a s a set of fre q ue n cy value s of 72nucleotide (7 nt) string s u sin g all po s sib le 7 nt string s(47 = 16 384) . 120 ge n e s highly exp re s s e d in t he to n s il germinal ce n ter B cell s were select e d fo r calcu lating t he 7 nt string fre q ue n cy value s of all intro n s(intro n 7nt) . In tro n 7n t wa s co n s idere d RNAs ( R NA pop
u latio n) that simulat e d t he tot al of small RNA fra gme n t s in cell s. K no win g t he 7 n t fre2 que n cy value s of DNA se gme n t s a n d t he intro n 7n t ,we ca n calcu lat e t he b inding de n s ity of DNA se g me n t s to t he intro n 7nt t hat wa s t erme d a s RNA binding de n sity. The RNA b in d in g de n sity wa s det ermine d by t he a mo unt of co m ple m e n t se que nce s. The mo re a mo unt of co mple me nt se que nce s , t he mo re de n sity of RNA b in ding. The RNA b inding d e n s i t y sim ulat e d t he total of small RNA fra gme nt s b o un d to the DNA se gme nt . Several p rincip al charact eri stic s were o b serve d fo r t he fir st time : (1) The me a n value of RNA b in ding de n sity of DNA se gme nt s o n X p wa s significa ntly higher tha n that o n Xq ( P < 01001) ; ( 2) The number s of DNA se gme n t s highly b in d in g RNAs were mo re o n X p tha n o n Xq ( P < 01001) ; (3) The clu s t er s o f RNA highly b in d in g DNA se gme n t s were a s s o c iat e d w it h re g io n s in which ge n e s e scap e in2 activatio n. I t ha s be e n sugge st e d t hat RNAs activat e ge ne s a nd t he interactio n of RNA2DNA in cell s are ext e n s ive ,for exa m ple , R NAs incre a se DNa se Ⅰse n s itivity of DNA , t here i s ple n ty of no n p rot ein2co d in g RNAs in cell s , t he b ind in g sp ecificity of DNA2RNA i s f ar higher t ha n t hat of DNA2p rot ein a nd t he affinity o f DNA wit h RNA i s incre a se d , a s co m2 p are d wit h DNA. The no n ra nd o m p rop e r tie s of d i stribu tio n of RNA highly b in d in g se gme nt s bet we e n X p a n d Xq , c o m2
b ine d wit h t he find ing of RNA activating ge ne s ,p ro vide a stro n g evide nce t hat RNA highly b in ding se gme nt s may serve
a s DNA signal s to p rop a gat e activatio n alo ng a chro mo so me a nd vice ver sa ,t he DNA se gme nt s t hat le s s
b ind RNAs
may sile n ce t he ge n e s .
Ke y w o r d s : X p ; Xq ; inactivatio n e scap e ; intro n RNA ; nucleotide string
收稿日期:2004 - 03 - 08 ;修回日期:2004 -09 -02
作者简介:吕占军(1952 - )男,博士生导师,河北省免疫学会副理事长,研究方向:衰老、分化和肿瘤发生理论及抗衰老、抗肿瘤对策研究
①通讯作者。E2mail:***************;T el :0311******** ,0311********
人X 染体长臂(Xq)和短臂(Xp )
吕占军①, 宋淑霞, 翟羽, 侯杰, 韩丽枝, 王秀芳
( 河北医科大学实验动物学部, 石家庄050017)
摘要: X 染体发生X 染体失活,但是X p 基因有30 %表现为逃逸,而Xq 仅不到3 % 。为了研究X 染体基因失活和表达逃逸发生和维持的分子机制,比较了Xq 和X p DNA 序列的RNA 模拟结合强度。X 染体的核苷酸序列被分为50 kb 一段, 对每一段DNA 做7 碱基(7 n t) 字符串组合分析(共有47 = 16 384 种组合) ,记录每段50 kb DNA 中每种7 nt 字符串的频率。选择生发中心B 细胞中的120 个高表达基因,计算这些基因的内含子7 nt 字符串的出现频率,称为intron 7n t ,以此作为R NAs ( RNA ,模拟细胞中R NA 在小片段的总和) 。已知一段DNA 序列的7 n t 频率值和intron 7nt ,即可以计算该DNA 段与intron 7n t 的结合强度。每段50 kb DNA 与intr on 7n t 的结合强度取决于该DNA 段与intron 7n t 互补核苷酸的频率,互补的核苷酸序列越多,结合强度就越大。DNA 段与intron 7n t 的模拟结合强度称为RNA 结合强度,试图模拟该段DNA 可以结合的R NA 小片段的总量。之所以采用7 nt 字符串组合分析是考虑到连续7 个核苷酸互补则可以形成相对稳定的结合。研究发现: 1) X p 各DNA 段的RNA 结合强度均值显著大于Xq ( P < 01001) ;2) X p 上高结合R NA 的DNA 段数目显著高于Xq ( P < 01001) ;3) RNA高结合DNA 段形成的簇与X 染体基因表达逃逸区关联。有证据表明,RNA 可以 通过改变染质构象活化基因并且该作用具有普遍意义:如R NA 增加染质对DNase Ⅰ消化的敏感性,互补R NA2DNA 的亲和性高于互补DNA2DNA ,细胞中有丰富的非编码R NA 和非编码DNA 等。研究中的发现结合R NA 活化基因的观点,提示X p 逃逸失活基因的数目多于Xq 可能与前者的R NA 结合强度大于后者有关。
关键词: X 染体长臂; X 染体短臂; 失活逃逸; 内含子R NA ; 核苷酸字符串
中图分类号: Q347 文献标识码: A 文章编号: 037924172 (2005) 0120001210
X chrom osom e inactivati on ( XCI) is the process whereby one of the tw o X chrom osom es in norm al dipl oid fem ale cells is inactivated t o com pensate for the dosage di fference of X2linked genes between m ales and fem ales. One of the m ost intriguing aspects of X inactivati on in hu2 m ans is that certain genes have been found that escape in2 activati on and are ex pressed from both X chrom osom es1 . The genes that escape inactivati on ( ex pressed from both the active and inactive X chrom osom es) are nonrandom ly distributed ,with the m aj ority of such transcripts m apping to the short arm on X chrom osom e and not to the l ong
Although the basis for the ex pressi on of these genes from the inactive X chrom osom e is unclear at present , their study is likely to be inform ative for understanding the chrom osom al m ec
hanism s involved in X inactivati on , im2 plying the existence of l ocal and/ or chrom osom al signals that distinguish genes that escape inactivati on from those that are subjected to inactivati on.
T o investigate the m olecular mechanism involved in the propagati on and m aintenance of X chrom osom e inacti2 vati on and escape ,the l ong arm and short arm of X chro2 m osom e were com pared for RNA binding density being a com puter sim ulati on of binding density of DNA segm ents and R NAs at 7 nt string level .
arm2 ,3. 30 % of the genes tested on X p escaped inactiva2 ti on ,whereas less than 3 % of the genes on Xq escaped inactivati on4 .
L Β Zhan 2J u n et al . : G
enome S equence C omparative Analysis of Long Arm 3
1 . 3 S e q u e n c e r e p r e s e nt a ti o n 1 M aterials and M ethods
1 . 3 . 1 DNA sequences
DN A sequences on X p (1~58 Mb ) were divided into 1 144 segm ents that include 33 incom plete segm ents ( 33/ 1
144 ,2. 88 %) and 1 111 com plete segm ents. DN A sequences on X q (60~153 Mb ) were divided into 1853 segm ents that
include 40 incom plete segm ents (40/ 1 853 ,2116 %) and 1 813 com plete segm ents. E very com plete segm ent (50 kb ) was recorded as frequency values of 7 nt strings using all possi 2 ble 7 nt strings ( 47
= 16 384) . The incom plete segm ents of which nucleotides were known m uch m ore than 10 % were selected for the foll owing count ; less than 10 %
known nucleotides were not counted. The m ethods of
counting for incom plete segm ents : search for the frequen 2
cy values of 7nt string , sum up the values ( = num ber of
nucleotides 26) then divided by 49 994 ( the sum of fre 2
quency value of 50 kb ) , the results pr ovided the coeffi 2
cient of those incom plete segm ents. The frequency value
multiplied by the coefficient gives the adjusted value of the incom plete segm ent . The available segments on X p were 1 133 and on Xq 1 841.
1 . 3 .
免清洗助焊剂2 R NA sequences
120 genes highly ex pressed in the tonsil germinal
center B cells were selected for calculating the 7 nt string
frequency values of all introns (from sense strand ) . E ach
intron sequence was recorded initially as a set of 7 nt fre 2
quency values. The sum of the 7 nt frequency values on
the sam e 72nucleotide string of all introns within the sam e
gene m ultiplied by the ex pressi on frequency of the gene gave the intron 7nt frequency values of this gene . The sum of the intron 7 nt frequency values of 120 genes ( intron
7nt ) was regarded as a sim ulati on of RNA fragm ents in
cells.
1 . 1 S e q u e n c e d a t a
were ob 2 Nucleotide sequences of X chrom osom e tained from the NC B I genom e database ( http :/ / w ww. ncbi . nlm. nih. gov/ genom e/ guide ) . Based on results of Digital Differential Display ( DDD ) ( h ttp :/ / www. ncbi . nlm. nih. gov/ Uni G ene . ddd. cgi ) , 120 genes highly ex 2 pressed in tonsil germinal center B cells were selected. They are : R PL 13 , YY 1 , G L TS C R 2 , KIAA 0217 , IN P P 5 D , NCF 1 , N CUB E 1 , PTB P 1 , MLL , TCF 3 , BAC H 2 ,
Y WH AQ , UB E 2 H , CGG B P 1 , CDC 2 , GG A 2 , S ER P 1 ,
EG LN 2 , DUT , B CL 2 L 12 , WSB 1 , PTEN , MBNL , P AX 5 ,
B C L 11 A , FTH 1 , SMC 4 L 1 , CSN K 1 A 1 , OSB P L 8 , WH 2
S C 1 , ALOX 5 , KIAA 0084 , Z F P 91 , KRAS 2 , FB P 17 , ZN F 265 ,
F U S I P 1 ,
FOX P 1 , CYorf 15 B ,
UGCG ,
C AM K 2
D , C 9 or f 5 , C LS TN 1 , DC 8 , CEN TB 2 , N KTR ,
S TK 39 , R ER E , PS P 1 , PB P , MB P , DA PP 1 , FLJ 11273 ,
KIAA 1323 , NA P 1 L 1 , RAS GR P 1 , CPN E 3 , UN C 93 B 1 ,
KIAA 1033 , A RS 2 , UB QLN 1 , L YN , TOMM 20 2PEN DIN G , KIAA 0746 , PB 1 , M F N G , H S PC A , EI F 4 EB P 2 , G LS ,
F LJ 22301 ,
EHD 1 ,
E L
F 1 ,
KIAA 1268 ,
OA ZIN , FLJ 10342 , CEP 1 , BART 1 , B TF , FLJ 20333 , RCOR ,
G DI 2 ,
F LJ 10407 , A P L P 2 ,
HN R P H 1 , MGC 4796 ,
CAS P 8 , PTPR CA P , HRB 2 , PR K ACB , M E F 2 B , N OL C 1 , L Z 16 , CAS T , ADD 3 , A KA P 13 , A E S , FLJ 10392 , FLJ 20085 ,
PS C D 1 ,
EI F 2 A K 3 , DDX 18 ,
CYBB ,
NA P 1 L 4 , PPP 3 CC , FLJ 10707 , CHER P , KIAA 0494 ,
DM TF 1 , R ER 1 , M YBL 1 , F AN C A , H SD 17 B 12 , CB X 3 , G NAS , NU P 153 , RANB P 2 , JJA Z 1 , A TM , IC A P 21 A and
NU P 88. 1 . 2 S of t w a r e
1 . 4 RNA bi n di n g d e n s i t y of DNA s e q u e n c e s
The search software used for analysis of frequency values of 72nucleotide string was written by the staff in our research team. The 50 kb DNA sequence w ould be repre 2 sented by a l ong colum n of num bers whose sum was 49 994 . MS E XCE L software was used for statistical analysis.
K nowing the 7 nt frequency values of DNA segm ents
and the intron 7 nt ,we can calculate the binding density
of DNA segm ents to the intron 7 nt . Intron 7 nt is R NA ex cept for T that substitutes for U. The binding density of
DNA segm ents to intron 7 nt was determined by the
am ount of com plem ent sequences. Intron 7 nt was m ulti 2 plied by 7nt frequency values of 50 kb DNA segm ents on the sam e row , and the sum of the products in the sam e colum n was intron 7nt binding density of this DNA seg 2 m ent :
E 16 384 = F1 + F2 + F3 + sity of intron 7nt to DNA
+ F16 384 = Binding den 2
This value sim ulates the total am ount of RNA frag 2
m ents binding to the DNA segm ent . C alculati on of the binding density of DNA segm ents to the intron 7 nt ( RNA
binding density ) is shown in T able 1 .
C1 ×E1 + C2 ×E 2 + C3 ×E3 +
+ C 16 384 ×
桩基泥浆比重
Ta bl e 1 Cal c ul a t i o n of t h e bi n di n g d e n s i t y of DNA s e g m e n t s t o t h e i n t r o n 7 n t #
A
模拟温度传感器B
C D
E F nt string f requency values of 50 kb D N A segment 7
I ntr on 7nt C ×E
142027. 28 # # 840. 10 # # 1630 . 68 # # 1 2
3
5′2A A A A A AA 23′3′2TTTTTTT 25′
5′2A A A A AAC 23′3′2TTTTTTG 25′ 5′2A A A A AAG 23′3′2TTTTTTC 25′
152 10
12 5′2A A A A A AA 23′
5′2A A A A AAC 23′ 5′2A A A A AAG 23′
934 . 39 84. 01
135 . 89 5′2TTTTTTT 23′3′2A A A A A AA 25′ 5′2TTTTTTT 23′
263692. 20 # #
Sum # # # ( RN A binding density )
16384
186
1417 . 7
# : This is an excel table and shows the calculation method of RN A binding density in a segment of 50 kb D N A. # # : C1 multiplied by E1 ,C2 multiplied by E2 ,C3 multiplied by E3 ,C16 384 multiplied by E16 384 ,respectively. # # # : ΣF12F16384 = RN A binding density of the DN A segment .
The nucleotide sequences on intron 7nt are com ple 2 m entary , for ex am ple , the frequency val
ue of 5′2 AAAAAAA 23′is cl ose to that of 5′2TTTTTTT 23′. S o the binding density of intron 7nt to 5′→3′DNA strand is similar to that to 3′→5′DNA strand. In this paper the re 2 sults of single strand DNA are reported.
( T able 2) .
X p and Xq were com pared for the num bers of seg 2
m ents that can highly bind RNAs ( high RNA binding
density ) . In all fragm ents from > 3 000 000 to > 2 520 000 ,the num bers of DNA segm ents that highly bind R NAs on X p are significantly higher than those on Xq ( P < 0 . 001 ,T able 3) .
2 Results
2 . 2 RNA bi n di n g d e n s i t y of DNA s e g m e nt s i n
e s c a p e a n d i n a c ti v a ti o n r e gi o n s o n Xp 2 . 1
RNA b i n d i n g Xp a n d Xq
d e n s i t y of DNA s e g m e n t s o n The regi ons in which escape genes centraliz ed were
separated into escape regi ons ( escape inactivati on re 2
gi ons ) ,the other regi ons were considered as inactivati on
regi ons ( Fig. 1) . Mean value of RNA binding density of
DNA segm ents in escape regi ons was signi ficantly higher
than that in inactivati on regi ons ( P < 01001 , T able 4) . T w o escape genes ( eI F 22 gamm a and SMCX ) in inactiva 2
ti on regi ons were l ocated within the RNA highly binding segm ents ( F ig. 1) .
50 kb DNA was considered as one segm ent ,therefore DNA sequences on X p and Xq were divided into 1 133 segm ents and 1 841 available segm ents respectively. The 7 nt frequency values of each segm ent were multiplied by intron 7nt on sam e string. Sum of the products in the sam e colum n was the R NA binding density. The binding density of RNAs and every DNA segm ent was calculated ( Fig. 1 , Fig. 2) . X p was significantly higher than Xq in m ean val 2 ues of the binding density of RNAs and DNA segm ents
L ΒZhan2J u n et al .: G enome S equence C omparative Analysis of Long Arm 5
婴童车
不锈钢表面钝化Fi g . 1 Si m ul a t i o n of bi n di n g of RNA s a n d DNA s e g m e n t s o n Xp
钢骨柱
The numbers below and above horiz ontal line are genome position of1M~58M (multiplied 50 kb by) and gene numbers in those
regions respectively.
“o”: Esca pe genes. SL C25A6 (genome position = 21 . 6), DXYS155 E , AL TE, stS G15779 , MIC2 , StS G9723 , StS G1369 , ARSD , G S1 , H s. 79876 , G S2 , S EDT , CXOR F5 , IN E2 , PIR , GR P R , StS G4551 , RbAp46 , eI F22gamma , CRS P150 , D FFRX , DDX3 , IN E1 , UTX , UB E1 , PCTK1 , SMCX (genome position = 1 038) .“●”: I nactivation genes. W I217390 , MID1 , HCCS , PR P S2 , W I214561 , PIG A , RAI2 , S C ML2 , PDH A1 , IS PK21 , SMS , SA T , PD K3 , stS G8688 , POLA , GK , AA156453 , PRGP1 , RP3 , C AS K , DXS8237 E , ZN F157 , ARA F1 , E L K1 , S TS2N34520 , JM23 , EB P , RBM3 , A007 K03 , JM21 , UG A LT , Pim2H , st S G4507 , TFE3 , T54 , A4 , LMO6 , W I221198 , A007H45 , A007D27 , IB 3700 , IB772 , DXS1013 E ,FG D1 , DXS7159 E , TRO , stS G13253 , UQC RB , W I - 14025 , ZXDB , ZXDA .“3 ”: Elevated expression genes. H s. 25625 , MSL 3L1 , GPM6B , PHK A2 , MT2A C T48 , H s.
103104 , H s. 192846 , G A T A1 .