Genome assembly: challenges in the spruce genome project

Conifer genomes have very large genomes, seven times the size of the human genome, something that made the Norway spruce genome project uncertain from the start.

Early in the project, it was found that most software for assembling and analysing genomes could not handle the large datasets that the project generated. There were other challenges as well - the spruce is a diploid with 12 chromosomes, but the differences within chromosome pairs are almost as large as chromosomes from human and chimp. In addition, the genomes contain very large repeated regions, something that is known to be problematic for genome assembly.
These problems were met by a combination of experimental and improvement computational techniques, with Lars Arvestad, Kristoffer Sahlin, and Francesco Vezzi working on several computational projects. An example of this is the BESST scaffolder which was the first scaffolder to correctly estimate (Sahlin et al, 2012) the distance between two contigs (assembled genome segments), an important feature for the data in the project. The assembly merger GAM-NGS (Vicedomini et al, 2013) also helped in combining different pieces of genome information.

References: 

• Nystedt B, Street NR, Wetterbom A, Zuccolo A, Lin YC, Scofield DG, Vezzi F, Delhomme N, Giacomello S, Alexeyenko A, Vicedomini R, Sahlin K, Sherwood E, Elfstrand M, Gramzow L, Holmberg K, Hallman J, Keech O, Klasson L, Koriabine M, Kucukoglu M, Kaller M, Luthman J, Lysholm F, Niittyla T, Olson A, Rilakovic N, Ritland C, Rossello JA, Sena J, Svensson T, Talavera-Lopez C, Theissen G, Tuominen H, Vanneste K, Wu ZQ, Zhang B, Zerbe P, Arvestad L, Bhalerao R, Bohlmann J, Bousquet J, Garcia Gil R, Hvidsten TR, de Jong P, MacKay J, Morgante M, Ritland K, Sundberg B, Thompson SL, Van de Peer Y, Andersson B, Nilsson O, Ingvarsson PK, Lundeberg J, Jansson S (2013) "The Norway spruce genome sequence and conifer genome evolution." Nature 497(7451), 579-584
• Vicedomini R, Vezzi F, Scalabrin S, Arvestad L, Policriti A (2013) "GAM-NGS: genomic assemblies merger for next generation sequencing." BMC Bioinformatics 14 Suppl 7, S6
• Sahlin K, Street N, Lundeberg J, Arvestad L (2012) "Improved gap size estimation for scaffolding algorithms." Bioinformatics 28(17), 2215-2222