University of Toronto St. George Campus - Astronomy
Data, Cloud, and Advanced Computing for R&D
Research
Jonathan
Dursi
Toronto, Canada Area
I work with teams in the broader public sector - as individuals, groups, or national communities - to help them make use of data, cloud, and advanced computing - to help them further their missions. Federated systems, teams, and data.
HPC Computational Science Specialist
SciNet is Canada’s largest supercomputing center; both a national facility, and the local focal point for scientific, technical, and data-intensive computing within the University of Toronto and associated research hospitals.
HPC Consulting:
• Consulted with research teams in forestry, climate science, biophysics, mechanical engineering, astrophysics, nanoscience, bioinformatics, to find best way to make use of Canada-wide HPC resources for their work, use case;
• Initiated, created, and maintained city- and nation-wide collaborations for HPC research, computational science, and training;
• Monitored and supervised computational research projects, and recommended necessary changes;
• Developed and taught training programs on many aspects of HPC for practitioners;
• Led research project examining use of fault-tolerant technologies from telecommunications for use in large-scale HPC systems.
• Organizer or Committee Chair, HPCS ( http://www.hpcs.ca ) series of HPC conferences, 2010, 2011, 2012, 2013
• Co-organized Toronto-wide Science Illustrated workshop on use of visualization for science communication ( http://scienceillustrated.ca );
Principal
Consulting on the management of R&D and data projects in the broader public sector and beyond
• R&D Management consulting on operations and organizations
• Data operations, governance, and privacy
Interim CTO
First CTO of Compute Canada, seconded from SciNet as an interim CTO while a permanent executive team was being assembled. Responsible for coordinating technical work occurring with ~150 FTEs across the country, all aimed at working with Canadian academic researchers to get more and bigger science done, faster.
• Consulted nationally on and shaped development of a first Strategic Plan for the reconstituted national organization
• Lead nation-wide reorganization of the technical operations, creating functional groups and raising profile of national education, outreach, and training and support activities;
• Restructured operations to reduce siloing of activities
• Initiated national rationalization process for ongoing operations and maintenance spending of approximately $3.5M/yr
• Lead collaborations on grants and partnerships with stakeholders across the country.
Scientific Associate, Informatics and Bio-computing
Developed novel bioinformatics methods for genomics with nanopore sequencing, and cancer genomics in a major international project (the eighteen-country Pan-Cancer Analysis of Whole Genomes)
• Developed and trained ensemble methods to analyze somatic (cancer) mutations in set of 2700 patients; ensemble methods produced a data set of cancer mutations for analysis by downstream cancer biologists with an estimated 2.6 million fewer false positive calls and 1.65 million fewer false negative calls than the single best available caller, in a data set of ~47 million mutations.
• In support of the training of ensemble methods, designed the validation strategy for producing “ground truth” mutation information for a training set.
• Designed, developed, and taught first “Machine Learning for Scientists” (http://ljdursi.github.io/ML-for-scientists/#1) training class for SciNet at the University of Toronto, which was fully booked and had a waiting list within days; received very high reviews.
• Designed, developed, and taught “Beyond Single-Core R” training class (https://ljdursi.github.io/beyond-single-core-R/#/) for scaling out analyses with R.
• Winner, first place, Toronto TrafficJam Hackathon, organized by Toronto Transportation Services, a data hackathon aimed at finding novel solutions to improving Toronto transportation.
• Co-author on first method for using nanopore sequencing to directly detect epigenetic markers (here, CpG methylation) on real genomic DNA samples (nanopolish).
• Co-author on first open-source and publicly available software for implementing and testing methods for “decoding” raw nanopore signals into genomic data (nanocall).
Architect and Technical Lead, CanDIG
Lead a national, distributed team of ~12 FTE and ~20 people designing and building the CanDIG project, a federated platform for national scale genomic and health data analysis over locally controlled private health data. Platform is a federated network of local health research centre sites running a 12-factor application with cloud technologies (Docker, go, Python, object store, relational and NoSQL databases). Engages in constant communication with senior stakeholders cross the country. $5M/4 year project with additional funding acquired.
• Delivering project on budget, on time, and exceeding scope.
• Designed novel distributed authn/z infrastructure over OpenID connect.
• Developing novel federated algorithms for distributed bioinformatics data sources.
• Project became one of the first driver projects for the international Global Alliance for Genomics and Health, participating in standards setting and best practice definition alongside much larger projects.
• Brought two national cancer research projects on board as early adopters of the platform.
• Enabled enough success for one early adopter pilot project that CanDIG was written into planning documents for the recently announced, ~100x larger, $300M project.
• Led grant-writing efforts for $800,000 in successful grants for national and international projects.
• Chosen to participate in and lead foundational technical work package for the international CINECA project, peering CINECA with EGA/ELIXIR and H3Africa, a €6.7M Canada/EU project. Work package 1 is on track to meet all milestones.
• Participated in development of one of only two successful Strategic Innovation Fund Stream 4 grant applications, the Digital Health and Discovery Platform, where CanDIG will play a foundational role in health research
B. Sc.
Math, Physics, Computer Science
Ph.D.
Astrophysics
M. Sc.
Physics, Astrophysics, Computational Science
Member of fencing team (co-captain, foil); NSERC Post Graduate fellowship, Queen's University Reinhardt Fellowship. Thesis topic: angular momentum and galaxy formation using computational simulation.
Canadian Astronomical Society
Modern astronomical research requires increasingly sophisticated computing facilities and software tools. Computational tools have become the fundamental tools to turn observational raw data into scientific insight. Complex multi-physics simulation codes have developed into tools for numerical experiments that provide scientific insight beyond classical theory. Canadian researchers need an environment for developement and maintenance of these critical tools. In particular, the drastically enhanced complexity of deeply heterogeneous hardware architectures poses a real challenge to using present and future HPC facilties. Without a national program in astrophysical simulation science and astronomy application code developement we are becoming vulnerable with respect to our ability to maximise the scientific return from existing and planned investments into atronomy. In addition, there are significant industrial/commercial HQP needs that simulation and application code program could start to address, if it is properly aligned with academic training opportunities. We outline the framework and requirements for such a framework for developing Canadian astronomical application and simulation codes — and code builders. In the US decadal plan process, voices are calling for similar emphasis on developing infrastructure and incentives for open community codes (Weiner et al. 2009). We propose funding several small interdisciplinary teams of postdocs, graduate students, and staff, housed in departments at Universities that have or are about to make a commitment in a relevant area (e.g. applied math, computational physics, modeling science). These teams can, while training astronomical and computational HQP, focus on building tools that have been deemed to be high priorities by the astronomical and astrophysical communities in order to make the best scientific use of our new computational faciliites.
Canadian Astronomical Society
Modern astronomical research requires increasingly sophisticated computing facilities and software tools. Computational tools have become the fundamental tools to turn observational raw data into scientific insight. Complex multi-physics simulation codes have developed into tools for numerical experiments that provide scientific insight beyond classical theory. Canadian researchers need an environment for developement and maintenance of these critical tools. In particular, the drastically enhanced complexity of deeply heterogeneous hardware architectures poses a real challenge to using present and future HPC facilties. Without a national program in astrophysical simulation science and astronomy application code developement we are becoming vulnerable with respect to our ability to maximise the scientific return from existing and planned investments into atronomy. In addition, there are significant industrial/commercial HQP needs that simulation and application code program could start to address, if it is properly aligned with academic training opportunities. We outline the framework and requirements for such a framework for developing Canadian astronomical application and simulation codes — and code builders. In the US decadal plan process, voices are calling for similar emphasis on developing infrastructure and incentives for open community codes (Weiner et al. 2009). We propose funding several small interdisciplinary teams of postdocs, graduate students, and staff, housed in departments at Universities that have or are about to make a commitment in a relevant area (e.g. applied math, computational physics, modeling science). These teams can, while training astronomical and computational HQP, focus on building tools that have been deemed to be high priorities by the astronomical and astrophysical communities in order to make the best scientific use of our new computational faciliites.
Canadian Astronomical Society
Modern astronomical research requires increasingly sophisticated computing facilities and software tools. Computational tools have become the fundamental tools to turn observational raw data into scientific insight. Complex multi-physics simulation codes have developed into tools for numerical experiments that provide scientific insight beyond classical theory. Canadian researchers need an environment for developement and maintenance of these critical tools. In particular, the drastically enhanced complexity of deeply heterogeneous hardware architectures poses a real challenge to using present and future HPC facilties. Without a national program in astrophysical simulation science and astronomy application code developement we are becoming vulnerable with respect to our ability to maximise the scientific return from existing and planned investments into atronomy. In addition, there are significant industrial/commercial HQP needs that simulation and application code program could start to address, if it is properly aligned with academic training opportunities. We outline the framework and requirements for such a framework for developing Canadian astronomical application and simulation codes — and code builders. In the US decadal plan process, voices are calling for similar emphasis on developing infrastructure and incentives for open community codes (Weiner et al. 2009). We propose funding several small interdisciplinary teams of postdocs, graduate students, and staff, housed in departments at Universities that have or are about to make a commitment in a relevant area (e.g. applied math, computational physics, modeling science). These teams can, while training astronomical and computational HQP, focus on building tools that have been deemed to be high priorities by the astronomical and astrophysical communities in order to make the best scientific use of our new computational faciliites.
Canadian Astronomical Society
Advanced research computing resources have never been so essential to the Canadian Astronomy and Astrophysics research community. In the past few years, astronomical researchers have benefited greatly from modern large-scale computing systems; a diverse range of resources, which are a good match to the diverse computing needs of our scientists; and good working relationships with existing providers, allowing flexibility and collaboration between these centres and research groups. However, CASCA has concerns about the near future of advanced research computing available to its researchers. Here the Computers, Data, and Networks Committee of CASCA present, on behalf of the Society, a summary of the current state of the computing needs, successes, and concerns of our researchers taken from previous consultative summaries and their updates. This is the first step of a process that will continue through the first half of 2013, which will include a comprehensive survey of research computing needs of the Canadian Astronomy and Astrophysics community, and will investigate a variety of strategies for meeting those needs.[...] In this report, we recommend an urgent search for new and sustainable sources of funding for advanced research computing funding; an increased focus on personnel, software development, and storage; maintaining a diverse range of systems; enabling major longer-term projects by committing resources for longer than the one-year allocation window currently of the RAC process; continuing to enable close working relationships with research groups and computing providers, preferably as close to the researchers as possible. In addition, we recommend that CCI's board, through the proposed Researcher Advisory Committee or otherwise, establish a direct relationship with CASCA (and similar professional groups), with via persons charged with representing the needs of these research communities in planning for Compute Canada.
Canadian Astronomical Society
Modern astronomical research requires increasingly sophisticated computing facilities and software tools. Computational tools have become the fundamental tools to turn observational raw data into scientific insight. Complex multi-physics simulation codes have developed into tools for numerical experiments that provide scientific insight beyond classical theory. Canadian researchers need an environment for developement and maintenance of these critical tools. In particular, the drastically enhanced complexity of deeply heterogeneous hardware architectures poses a real challenge to using present and future HPC facilties. Without a national program in astrophysical simulation science and astronomy application code developement we are becoming vulnerable with respect to our ability to maximise the scientific return from existing and planned investments into atronomy. In addition, there are significant industrial/commercial HQP needs that simulation and application code program could start to address, if it is properly aligned with academic training opportunities. We outline the framework and requirements for such a framework for developing Canadian astronomical application and simulation codes — and code builders. In the US decadal plan process, voices are calling for similar emphasis on developing infrastructure and incentives for open community codes (Weiner et al. 2009). We propose funding several small interdisciplinary teams of postdocs, graduate students, and staff, housed in departments at Universities that have or are about to make a commitment in a relevant area (e.g. applied math, computational physics, modeling science). These teams can, while training astronomical and computational HQP, focus on building tools that have been deemed to be high priorities by the astronomical and astrophysical communities in order to make the best scientific use of our new computational faciliites.
Canadian Astronomical Society
Advanced research computing resources have never been so essential to the Canadian Astronomy and Astrophysics research community. In the past few years, astronomical researchers have benefited greatly from modern large-scale computing systems; a diverse range of resources, which are a good match to the diverse computing needs of our scientists; and good working relationships with existing providers, allowing flexibility and collaboration between these centres and research groups. However, CASCA has concerns about the near future of advanced research computing available to its researchers. Here the Computers, Data, and Networks Committee of CASCA present, on behalf of the Society, a summary of the current state of the computing needs, successes, and concerns of our researchers taken from previous consultative summaries and their updates. This is the first step of a process that will continue through the first half of 2013, which will include a comprehensive survey of research computing needs of the Canadian Astronomy and Astrophysics community, and will investigate a variety of strategies for meeting those needs.[...] In this report, we recommend an urgent search for new and sustainable sources of funding for advanced research computing funding; an increased focus on personnel, software development, and storage; maintaining a diverse range of systems; enabling major longer-term projects by committing resources for longer than the one-year allocation window currently of the RAC process; continuing to enable close working relationships with research groups and computing providers, preferably as close to the researchers as possible. In addition, we recommend that CCI's board, through the proposed Researcher Advisory Committee or otherwise, establish a direct relationship with CASCA (and similar professional groups), with via persons charged with representing the needs of these research communities in planning for Compute Canada.
Canadian Astronomical Society
Significant investment in new large, expensive astronomical observing facilities spanning a substantial portion of the electronic spectrum was a dominant theme of LRP2000 and continues to be necessary for Canadian astronomy to maintain its world position. These developments are generating increasingly large volumes of data. Such investments only makes sense if they are balanced by strong infrastructure support to ensure that data acquired with these facilities can be readily accessed and analyzed by observers, and that theoreticians have the tools available to simulate and understand their context. This will require continuing investment in computational facilities to store and analyze the data, networks to ensure useful access to the data and products by Canadian researchers, and personnel to help Canadian researchers make use of these tools. In addition, large parallel simulations have become an essential tool for astrophysical theory, and Canadian Astronomy has world-leading simulators and developers who rely on world-class High Performance Computing facilities being maintained in Canada to do their research effectively. We recommend that Compute Canada be funded at $72M/yr to bring HPC funding per capita in line with G8 norms; that part of every Compute Canada technology renewal include a Top-20 class computing facility; NSERC and other funding agencies begin supporting software development as an integral component of scientific research; that the stable funding for consortia be tripled, including local access to technical analyst staff; and that the last mile bottleneck of campus networking less than 10 Gb/s be addressed where it is impacting researchers, with particular urgency for the current 1 Gb/s connection at the CADC.