EMS-GT2: An Improved Exact Solution for the (l, d)-Planted Motif Problem

Volume 3, Issue 3
MARK JOSEPH D. RONQUILLO, PROCESO L. FERNANDEZ
Published online: 22 June 2017
Article Views: 50

Abstract
In this research, the EMS-GT2 algorithm, an extension of the Exact Motif Search-Generate and Test (EMS- GT), an exact enumerative algorithm for PMS, is being proposed. In EMS-GT2, a new speedup technique based on an important property is discovered and incorporated. This speedup has enabled a more efficient block-processing of candidate motifs. The C++ implementation of EMS-GT2 running on synthetic data for several PMS challenge instances demonstrates that it is competitive with both the EMS-GT and qPMS9, the two current best exact solutions for PMS. In particular, EMS-GT2 is able to reduce the run-times of EMS-GT by 20.3%, 15.8% and 22.6% for the (l, d) challenge instances (13, 4), (15, 5) and (17, 6) respectively. It also outperforms qPMS9, having runtime reductions of 91.6%, 79.3%, 82.0%, 59.4% and 9.7% for the (9, 2), (11, 3), (13, 4), (15, 5) and (17, 6) synthetic challenge instances respectively.
Reference
- S. Rajasekaran, S. Balla and C. H. Huang, “Exact algorithms for planted motif problems,” Journal of Computational Biology, vol. 12, no. 8, pp. 1117-1128, 2005.
- P. A. Evans, A. D. Smith and H. T. Wareham, “On the complexity of finding common approximate substrings,” Theoretical Computer Science, vol. 306, no. 1, pp. 407-430, 2003.
- M. C. I. D. Sia, J. Q. Nabos and P. L. Fernandez, “An efficient exact solution for the (l, d)-planted motif problem,” in 8th AUN/SEED-Net Regional Conference on Electrical and Electronics Engineering, Manila, Philippines, Nov. 16-17, 2015.
- A. J. Stewart, S. Hannenhalli and J. B. Plotkin, “Why transcription factor binding sites are ten nucleotides long,” Genetics, vol. 192, no. 3, pp. 973-985, 2012.
- C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu, A. F. Neuwald and J. C. Wootton, “Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment,” Science, vol. 262, no. 5131, pp. 208-214, 1993.
- T. L. Bailey, N. Williams, C. Misleh and W. W. Li, “MEME: Discovering and analyzing DNA and protein sequence motifs,” Nucleic Acids Research, vol. 34, pp. W369-W373, 2006.
- H. Buhler and M. Tompa, “Finding motifs using random projections,” in Proceeding of the Fifth Annual International Conference on Computational Biology, New York, NY, 2001, pp. 69-76.
- H. Huo, Z. Zhao, V. Stojkovic and L. Liu, “Combining genetic algorithm and random projection strategy for (l, d)-motif discovery,” in Fourth International Conference on Bio-Inspired Computing, Beijing, China, Oct. 16-19, 2009. pp. 1-6.
- P. A. Pevzner and S. H. Sze, “Combinatorial approaches to finding subtle signals in DNA sequences,” in International Conference on Intelligent Systems for Molecular Biology, San Diago, CA, Aug. 19-23, 2000.
- U. Keich and P. Pevzner, “Finding motifs in the twilight zone,” Bioinformatics, vol. 18, no. 10, pp. 1374-1381, 2002.
- A. Price, S. Ramabhadran and P. A. Pevzner, “Finding subtle motifs by branching from sample strings,” Bioinformatics, vol. 19, no. 2, pp. 149-155, 2003.
- G. Z. Hertz and G. D. Stormo, “Identifying DNA and protein patterns with statistically significant alignments of multiple sequences,” Bioinformatics, vol. 15, no. 7, pp. 563-577, 1999.
- J. Davila, S. Balla and S. Rajasekaran, “Fast and practical algorithms for planted (l, d) motif search,” IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), vol. 4, no. 4, pp. 544-552, 2007.
- H. Dinh, S. Rajasekaran and V. K. Kundeti, “PMS5: An efficient exact algorithm for the (l, d)-motif finding problem,” BMC Bioinformatics, vol. 12, no. 1, pp. 01-10, 2011.
- S. Bandyopadhyay, S. Sahni and S. Rajasekaran, “PMS6: A fast algorithm for motif discovery,” International Journal of Bioinformatics Research and Applications 2, vol. 10, no. 4-5, pp. 369-383, 2014.
- H. Dinh, S. Rajasekaran and J. Davila, “qPMS7: A fast algorithm for finding (l, d)-motifs in DNA and protein sequences,” PloS One, vol. 7, no. 7, pp. 01-08, 2012.
- M. Nicolae and S. Rajasekaran, “Efficient sequential and parallel algorithms for planted motif search,” BMC Bioinformatics, vol. 15, no. 1, pp. 01-10, 2014.
- M. Nicolae and S. Rajasekaran, “qPMS9: An efficient algorithm for quorum planted motif search,” Scientific Reports, vol. 5, pp. 01-017, 2015.
- A. M. Carvalho, A. T. Freitas, A. L. Oliveira and M. F. Sagot, “A highly scalable algorithm for the extraction of cisregulatory regions,” in 3rd Asia-Pacific Bioinformatics Conference, Singapore, Jan. 17-21, 2005.
- N. Pisanti, A. M. Carvalho, L. Marsan and M. F. Sagot, “RISOTTO: Fast extraction of motifs with mismatches,” in Latin American Symposium on Theoretical Informatics, Valdivia, Chile, March 20-24, 2006, pp. 757-768.
- M. Sagot, “Spelling approximate repeated or common motifs using a suffix tree,” in Lecture Notes in Computer Science, C. L. Lucchesi and A. V. Moura, Eds. Berlin, Germany: Springer.
- E. Eskin and P. A. Pevzner, “Finding composite regulatory patterns in DNA sequences,” Bioinformatics, vol. 18, no. 1, pp. S354-S363, 2002.
- F. Y. Chin and H. C. Leung, “Voting algorithms for discovering long motifs,” in 3rd Asia-Pacific Bioinformatics Conference, Singapor, Jan. 17-21, 2005, pp. 261-271.
- N. S. Dasari, R. Desh and M. Zubair, “An efficient multicore implementation of planted motif problem,” in International Conference on High Performance Computing and Simulation (HPCS), Caen, France, June 28- July 2, 2010, pp. 9-15.
To Cite this article
M. J. D. Ronquillo and P. L. Fernandez, “EMS-GT2: An improved exact solution for the (l, d)-planted motif problem,” International Journal of Technology and Engineering Studies, vol. 3, no. 3, pp. 124-132, 2017.
|