EMS-GT2: An Improved Exact Solution for the (l, d)-Planted Motif Problem
Volume 3, Issue 3 MARK JOSEPH D. RONQUILLO, PROCESO L. FERNANDEZ
Published online: 22 June 2017
Article Views: 50
Abstract
In this research, the EMS-GT2 algorithm, an extension of the Exact Motif Search-Generate and Test (EMS- GT), an exact enumerative algorithm for PMS, is being proposed. In EMS-GT2, a new speedup technique based on an important property is discovered and incorporated. This speedup has enabled a more efficient block-processing of candidate motifs. The C++ implementation of EMS-GT2 running on synthetic data for several PMS challenge instances demonstrates that it is competitive with both the EMS-GT and qPMS9, the two current best exact solutions for PMS. In particular, EMS-GT2 is able to reduce the run-times of EMS-GT by 20.3%, 15.8% and 22.6% for the (l, d) challenge instances (13, 4), (15, 5) and (17, 6) respectively. It also outperforms qPMS9, having runtime reductions of 91.6%, 79.3%, 82.0%, 59.4% and 9.7% for the (9, 2), (11, 3), (13, 4), (15, 5) and (17, 6) synthetic challenge instances respectively.
Reference
S. Rajasekaran, S. Balla and C. H. Huang, “Exact algorithms for planted motif problems,” Journal of Computational Biology, vol. 12, no. 8, pp. 1117-1128, 2005.
P. A. Evans, A. D. Smith and H. T. Wareham, “On the complexity of finding common approximate substrings,” Theoretical Computer Science, vol. 306, no. 1, pp. 407-430, 2003.
M. C. I. D. Sia, J. Q. Nabos and P. L. Fernandez, “An efficient exact solution for the (l, d)-planted motif problem,” in 8th AUN/SEED-Net Regional Conference on Electrical and Electronics Engineering, Manila, Philippines, Nov. 16-17, 2015.
A. J. Stewart, S. Hannenhalli and J. B. Plotkin, “Why transcription factor binding sites are ten nucleotides long,” Genetics, vol. 192, no. 3, pp. 973-985, 2012.
C. E. Lawrence, S. F. Altschul, M. S. Boguski, J. S. Liu, A. F. Neuwald and J. C. Wootton, “Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment,” Science, vol. 262, no. 5131, pp. 208-214, 1993.
T. L. Bailey, N. Williams, C. Misleh and W. W. Li, “MEME: Discovering and analyzing DNA and protein sequence motifs,” Nucleic Acids Research, vol. 34, pp. W369-W373, 2006.
H. Buhler and M. Tompa, “Finding motifs using random projections,” in Proceeding of the Fifth Annual International Conference on Computational Biology, New York, NY, 2001, pp. 69-76.
H. Huo, Z. Zhao, V. Stojkovic and L. Liu, “Combining genetic algorithm and random projection strategy for (l, d)-motif discovery,” in Fourth International Conference on Bio-Inspired Computing, Beijing, China, Oct. 16-19, 2009. pp. 1-6.
P. A. Pevzner and S. H. Sze, “Combinatorial approaches to finding subtle signals in DNA sequences,” in International Conference on Intelligent Systems for Molecular Biology, San Diago, CA, Aug. 19-23, 2000.
U. Keich and P. Pevzner, “Finding motifs in the twilight zone,” Bioinformatics, vol. 18, no. 10, pp. 1374-1381, 2002.
A. Price, S. Ramabhadran and P. A. Pevzner, “Finding subtle motifs by branching from sample strings,” Bioinformatics, vol. 19, no. 2, pp. 149-155, 2003.
G. Z. Hertz and G. D. Stormo, “Identifying DNA and protein patterns with statistically significant alignments of multiple sequences,” Bioinformatics, vol. 15, no. 7, pp. 563-577, 1999.
J. Davila, S. Balla and S. Rajasekaran, “Fast and practical algorithms for planted (l, d) motif search,” IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), vol. 4, no. 4, pp. 544-552, 2007.
H. Dinh, S. Rajasekaran and V. K. Kundeti, “PMS5: An efficient exact algorithm for the (l, d)-motif finding problem,” BMC Bioinformatics, vol. 12, no. 1, pp. 01-10, 2011.
S. Bandyopadhyay, S. Sahni and S. Rajasekaran, “PMS6: A fast algorithm for motif discovery,” International Journal of Bioinformatics Research and Applications 2, vol. 10, no. 4-5, pp. 369-383, 2014.
H. Dinh, S. Rajasekaran and J. Davila, “qPMS7: A fast algorithm for finding (l, d)-motifs in DNA and protein sequences,” PloS One, vol. 7, no. 7, pp. 01-08, 2012.
M. Nicolae and S. Rajasekaran, “Efficient sequential and parallel algorithms for planted motif search,” BMC Bioinformatics, vol. 15, no. 1, pp. 01-10, 2014.
M. Nicolae and S. Rajasekaran, “qPMS9: An efficient algorithm for quorum planted motif search,” Scientific Reports, vol. 5, pp. 01-017, 2015.
A. M. Carvalho, A. T. Freitas, A. L. Oliveira and M. F. Sagot, “A highly scalable algorithm for the extraction of cisregulatory regions,” in 3rd Asia-Pacific Bioinformatics Conference, Singapore, Jan. 17-21, 2005.
N. Pisanti, A. M. Carvalho, L. Marsan and M. F. Sagot, “RISOTTO: Fast extraction of motifs with mismatches,” in Latin American Symposium on Theoretical Informatics, Valdivia, Chile, March 20-24, 2006, pp. 757-768.
M. Sagot, “Spelling approximate repeated or common motifs using a suffix tree,” in Lecture Notes in Computer Science, C. L. Lucchesi and A. V. Moura, Eds. Berlin, Germany: Springer.
E. Eskin and P. A. Pevzner, “Finding composite regulatory patterns in DNA sequences,” Bioinformatics, vol. 18, no. 1, pp. S354-S363, 2002.
F. Y. Chin and H. C. Leung, “Voting algorithms for discovering long motifs,” in 3rd Asia-Pacific Bioinformatics Conference, Singapor, Jan. 17-21, 2005, pp. 261-271.
N. S. Dasari, R. Desh and M. Zubair, “An efficient multicore implementation of planted motif problem,” in International Conference on High Performance Computing and Simulation (HPCS), Caen, France, June 28- July 2, 2010, pp. 9-15.
To Cite this article
M. J. D. Ronquillo and P. L. Fernandez, “EMS-GT2: An improved exact solution for the (l, d)-planted motif problem,” International Journal of Technology and Engineering Studies, vol. 3, no. 3, pp. 124-132, 2017.