In the absence of infectious virus, strains of mice express polyadenylated RNA transcripts homologous to the genome of murine leukemia virus. In addition to transcripts consistent with full-length and spliced env retroviral RNAs, several unique RNA species which lack the env sequence accumulate in a tissue-specific manner. These RNA species are presumed to be transcribed from endogenous retroviral sequences that constitute the bulk of the murine leukemia virus-related sequences in the murine genome. To determine the relationship of these RNA transcripts to infectious murine leukemia virus and the precise structural basis of the heterogeneity observed for the env-lacking transcripts, we isolated and sequenced cDNA recombinants representing the RNAs expressed in strain 129 GIX+ mice. Comparisons of the nucleotide sequences demonstrated that the endogenous retroviral transcripts differed in pol, p15E, and R-peptide regions by single nucleotide changes. In contrast, the gp70-coding regions of two cDNA clones derived from epididymis and liver were completely homologous over a 599-nucleotide overlapping sequence. The structures of env-lacking transcripts were examined in two independent cDNA clones, and each was found to contain a different deletion that was potentially mediated by seven-base pair direct repeats in the intact sequence. The extensive sequence homology between cDNAs allowed construction of a cumulative sequence map of the 3' end of an intact endogenous retroviral transcript. A comparison of this sequence with infectious ecotropic and mink cell focus-forming viruses revealed that the endogenous transcripts are highly homologous with the substituted portions of leukemogenic mink cell focus-forming viruses and therefore further define the boundaries of recombination required to generate these viruses.