Chanchal Roy

 Chanchal Roy

Chanchal Roy

  • Courses3
  • Reviews13

Biography

University of Saskatchewan - Computer Science



Experience

  • Department of Computer Science, University of Saskatchewan

    Assistant Professor

    Teaching + Research in Software Engineering

  • Department of Computer Science, University of Saskatchewan

    Professor

    Teaching + Research in Software Engineering

  • Department of Computer Science, University of Saskatchewan

    Associate Professor

    Teaching + Research in Software Engineering

Education

  • Khulna University

    BSc in CSE

    Bachelor of Science in Computer Science and Engineering

  • Queen's University

    PhD

    Computer Software Engineering

  • RWTH Aachen

    MSc

    Software Systems Engineering

Publications

  • Analyzing and Forecasting Near-miss Clones in Evolving Software: An Empirical Study

    16th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS 2011)

    Effort for development and maintenance of complex large software is believed to have dependency on the amount of duplicated code fragments (code clones) present in codebases. For example, clones need to be carefully and consistently maintained and/or refactored for preventing accidental error propagation. Thus it is important to understand the proportion and evolution of clones in evolving software systems for cost estimation or the like. This paper presents a study on the evolution of near-miss clones at release level in medium to large open source software systems of different types (operating systems, database systems, editors, etc.) written in three different programming languages namely C, C#, and Java. Using a hybrid clone detector, NiCad, we detected both exact and near-miss clones at different levels of similarity. Applying statistical methods we investigated, from different dimensions, the evolution of both exact and nearmiss clones, and also forecasted the amount of clones in future releases of the software systems. Our study offers significant insights into the existence and evolution of code clones and their relationships with programming language or paradigm and program size.

  • Analyzing and Forecasting Near-miss Clones in Evolving Software: An Empirical Study

    16th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS 2011)

    Effort for development and maintenance of complex large software is believed to have dependency on the amount of duplicated code fragments (code clones) present in codebases. For example, clones need to be carefully and consistently maintained and/or refactored for preventing accidental error propagation. Thus it is important to understand the proportion and evolution of clones in evolving software systems for cost estimation or the like. This paper presents a study on the evolution of near-miss clones at release level in medium to large open source software systems of different types (operating systems, database systems, editors, etc.) written in three different programming languages namely C, C#, and Java. Using a hybrid clone detector, NiCad, we detected both exact and near-miss clones at different levels of similarity. Applying statistical methods we investigated, from different dimensions, the evolution of both exact and nearmiss clones, and also forecasted the amount of clones in future releases of the software systems. Our study offers significant insights into the existence and evolution of code clones and their relationships with programming language or paradigm and program size.

  • Conflict-aware Optimal Scheduling of Code Clone Refactoring: A Constraint Programming Approach

    19th IEEE International Conference on Program Comprehension (ICPC 2011) - student symposium

    Duplicated code, also known as code clones, are one of the malicious ‘code smells’ that often need to be removed through refactoring for enhancing maintainability. Among all the potential refactoring opportunities, the choice and order of a set of refactoring activities may have distinguishable effect on the design/code quality. Moreover, there may be dependencies and conflicts among those refactorings. The organization may also impose priorities on certain refactoring activities. Addressing all these conflicts, priorities, and dependencies, manual formulation of an optimal refactoring schedule is very expensive, if not impossible. Therefore, an automated refactoring scheduler is necessary, which will maximize benefit and minimize refactoring effort. In this paper, we present a refactoring effort model, and propose a constraint programming approach for conflict-aware optimal scheduling of code clone refactoring.

  • Analyzing and Forecasting Near-miss Clones in Evolving Software: An Empirical Study

    16th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS 2011)

    Effort for development and maintenance of complex large software is believed to have dependency on the amount of duplicated code fragments (code clones) present in codebases. For example, clones need to be carefully and consistently maintained and/or refactored for preventing accidental error propagation. Thus it is important to understand the proportion and evolution of clones in evolving software systems for cost estimation or the like. This paper presents a study on the evolution of near-miss clones at release level in medium to large open source software systems of different types (operating systems, database systems, editors, etc.) written in three different programming languages namely C, C#, and Java. Using a hybrid clone detector, NiCad, we detected both exact and near-miss clones at different levels of similarity. Applying statistical methods we investigated, from different dimensions, the evolution of both exact and nearmiss clones, and also forecasted the amount of clones in future releases of the software systems. Our study offers significant insights into the existence and evolution of code clones and their relationships with programming language or paradigm and program size.

  • Conflict-aware Optimal Scheduling of Code Clone Refactoring: A Constraint Programming Approach

    19th IEEE International Conference on Program Comprehension (ICPC 2011) - student symposium

    Duplicated code, also known as code clones, are one of the malicious ‘code smells’ that often need to be removed through refactoring for enhancing maintainability. Among all the potential refactoring opportunities, the choice and order of a set of refactoring activities may have distinguishable effect on the design/code quality. Moreover, there may be dependencies and conflicts among those refactorings. The organization may also impose priorities on certain refactoring activities. Addressing all these conflicts, priorities, and dependencies, manual formulation of an optimal refactoring schedule is very expensive, if not impossible. Therefore, an automated refactoring scheduler is necessary, which will maximize benefit and minimize refactoring effort. In this paper, we present a refactoring effort model, and propose a constraint programming approach for conflict-aware optimal scheduling of code clone refactoring.

  • Evaluating Code Clone Genealogies at Release level: An Empirical Study

    10th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2010)

    Code clone genealogies show how clone groups evolve with the evolution of the associated software system, and thus could provide important insights on the maintenance implications of clones. In this paper, we provide an in-depth empirical study for evaluating clone genealogies in evolving open source systems at the release level. We develop a clone genealogy extractor, examine 17 open source C, Java, C++ and C# systems of diverse varieties and study different dimensions of how clone groups evolve with the evolution of the software systems. Our study shows that majority of the clone groups of the clone genealogies either propagate without any syntactic changes or change consistently in the subsequent releases, and that many of the genealogies remain alive during the evolution. These findings seem to be consistent with the findings of a previous study that clones may not be as detrimental in software maintenance as believed to be (at least by many of us), and that instead of aggressively refactoring clones, we should possibly focus on tracking and managing clones during the evolution of software systems.

  • Analyzing and Forecasting Near-miss Clones in Evolving Software: An Empirical Study

    16th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS 2011)

    Effort for development and maintenance of complex large software is believed to have dependency on the amount of duplicated code fragments (code clones) present in codebases. For example, clones need to be carefully and consistently maintained and/or refactored for preventing accidental error propagation. Thus it is important to understand the proportion and evolution of clones in evolving software systems for cost estimation or the like. This paper presents a study on the evolution of near-miss clones at release level in medium to large open source software systems of different types (operating systems, database systems, editors, etc.) written in three different programming languages namely C, C#, and Java. Using a hybrid clone detector, NiCad, we detected both exact and near-miss clones at different levels of similarity. Applying statistical methods we investigated, from different dimensions, the evolution of both exact and nearmiss clones, and also forecasted the amount of clones in future releases of the software systems. Our study offers significant insights into the existence and evolution of code clones and their relationships with programming language or paradigm and program size.

  • Conflict-aware Optimal Scheduling of Code Clone Refactoring: A Constraint Programming Approach

    19th IEEE International Conference on Program Comprehension (ICPC 2011) - student symposium

    Duplicated code, also known as code clones, are one of the malicious ‘code smells’ that often need to be removed through refactoring for enhancing maintainability. Among all the potential refactoring opportunities, the choice and order of a set of refactoring activities may have distinguishable effect on the design/code quality. Moreover, there may be dependencies and conflicts among those refactorings. The organization may also impose priorities on certain refactoring activities. Addressing all these conflicts, priorities, and dependencies, manual formulation of an optimal refactoring schedule is very expensive, if not impossible. Therefore, an automated refactoring scheduler is necessary, which will maximize benefit and minimize refactoring effort. In this paper, we present a refactoring effort model, and propose a constraint programming approach for conflict-aware optimal scheduling of code clone refactoring.

  • Evaluating Code Clone Genealogies at Release level: An Empirical Study

    10th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2010)

    Code clone genealogies show how clone groups evolve with the evolution of the associated software system, and thus could provide important insights on the maintenance implications of clones. In this paper, we provide an in-depth empirical study for evaluating clone genealogies in evolving open source systems at the release level. We develop a clone genealogy extractor, examine 17 open source C, Java, C++ and C# systems of diverse varieties and study different dimensions of how clone groups evolve with the evolution of the software systems. Our study shows that majority of the clone groups of the clone genealogies either propagate without any syntactic changes or change consistently in the subsequent releases, and that many of the genealogies remain alive during the evolution. These findings seem to be consistent with the findings of a previous study that clones may not be as detrimental in software maintenance as believed to be (at least by many of us), and that instead of aggressively refactoring clones, we should possibly focus on tracking and managing clones during the evolution of software systems.

  • Useful, but usable? Factors Affecting the Usability of APIs

    18th IEEE Working Conference on Reverse Engineering (WCRE 2011)

    Software development today has been largely dependent on the use of API libraries, frameworks, and reusable components. However, the API usability issues often increase the development cost (e.g., time, effort) and lower code quality. In this regard, we study 1,513 bug-posts across five different bug repositories, using both qualitative and quantitative analysis. We identify the API usability issues that are reflected in the bugposts from the API users, and distinguish relative significance of the usability factors. Moreover, from the lessons learned by manual investigation of the bug-posts, we provide further insight into the most frequent API usability issues.

  • Analyzing and Forecasting Near-miss Clones in Evolving Software: An Empirical Study

    16th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS 2011)

    Effort for development and maintenance of complex large software is believed to have dependency on the amount of duplicated code fragments (code clones) present in codebases. For example, clones need to be carefully and consistently maintained and/or refactored for preventing accidental error propagation. Thus it is important to understand the proportion and evolution of clones in evolving software systems for cost estimation or the like. This paper presents a study on the evolution of near-miss clones at release level in medium to large open source software systems of different types (operating systems, database systems, editors, etc.) written in three different programming languages namely C, C#, and Java. Using a hybrid clone detector, NiCad, we detected both exact and near-miss clones at different levels of similarity. Applying statistical methods we investigated, from different dimensions, the evolution of both exact and nearmiss clones, and also forecasted the amount of clones in future releases of the software systems. Our study offers significant insights into the existence and evolution of code clones and their relationships with programming language or paradigm and program size.

  • Conflict-aware Optimal Scheduling of Code Clone Refactoring: A Constraint Programming Approach

    19th IEEE International Conference on Program Comprehension (ICPC 2011) - student symposium

    Duplicated code, also known as code clones, are one of the malicious ‘code smells’ that often need to be removed through refactoring for enhancing maintainability. Among all the potential refactoring opportunities, the choice and order of a set of refactoring activities may have distinguishable effect on the design/code quality. Moreover, there may be dependencies and conflicts among those refactorings. The organization may also impose priorities on certain refactoring activities. Addressing all these conflicts, priorities, and dependencies, manual formulation of an optimal refactoring schedule is very expensive, if not impossible. Therefore, an automated refactoring scheduler is necessary, which will maximize benefit and minimize refactoring effort. In this paper, we present a refactoring effort model, and propose a constraint programming approach for conflict-aware optimal scheduling of code clone refactoring.

  • Evaluating Code Clone Genealogies at Release level: An Empirical Study

    10th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2010)

    Code clone genealogies show how clone groups evolve with the evolution of the associated software system, and thus could provide important insights on the maintenance implications of clones. In this paper, we provide an in-depth empirical study for evaluating clone genealogies in evolving open source systems at the release level. We develop a clone genealogy extractor, examine 17 open source C, Java, C++ and C# systems of diverse varieties and study different dimensions of how clone groups evolve with the evolution of the software systems. Our study shows that majority of the clone groups of the clone genealogies either propagate without any syntactic changes or change consistently in the subsequent releases, and that many of the genealogies remain alive during the evolution. These findings seem to be consistent with the findings of a previous study that clones may not be as detrimental in software maintenance as believed to be (at least by many of us), and that instead of aggressively refactoring clones, we should possibly focus on tracking and managing clones during the evolution of software systems.

  • Useful, but usable? Factors Affecting the Usability of APIs

    18th IEEE Working Conference on Reverse Engineering (WCRE 2011)

    Software development today has been largely dependent on the use of API libraries, frameworks, and reusable components. However, the API usability issues often increase the development cost (e.g., time, effort) and lower code quality. In this regard, we study 1,513 bug-posts across five different bug repositories, using both qualitative and quantitative analysis. We identify the API usability issues that are reflected in the bugposts from the API users, and distinguish relative significance of the usability factors. Moreover, from the lessons learned by manual investigation of the bug-posts, we provide further insight into the most frequent API usability issues.

  • A Constraint Programming Approach to Conflict-aware Optimal Scheduling of Prioritized Code Clone Refactoring

    11th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2011)

    Duplicated code, also known as code clones, are one of the malicious ‘code smells’ that often need to be removed through refactoring for enhancing maintainability. Among all the potential refactoring opportunities, the choice and order of a set of refactoring activities may have distinguishable effect on the design/code quality. Moreover, there may be dependencies and conflicts among those refactorings. The organization may also impose priorities on certain refactoring activities. Addressing all these conflicts, priorities, and dependencies, manual formulation of an optimal refactoring schedule is very expensive, if not impossible. Therefore, an automated refactoring scheduler is necessary, which will maximize benefit and minimize refactoring effort. In this paper, we present a refactoring effort model, and propose a constraint programming approach for conflict-aware optimal scheduling of code clone refactoring.

  • Analyzing and Forecasting Near-miss Clones in Evolving Software: An Empirical Study

    16th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS 2011)

    Effort for development and maintenance of complex large software is believed to have dependency on the amount of duplicated code fragments (code clones) present in codebases. For example, clones need to be carefully and consistently maintained and/or refactored for preventing accidental error propagation. Thus it is important to understand the proportion and evolution of clones in evolving software systems for cost estimation or the like. This paper presents a study on the evolution of near-miss clones at release level in medium to large open source software systems of different types (operating systems, database systems, editors, etc.) written in three different programming languages namely C, C#, and Java. Using a hybrid clone detector, NiCad, we detected both exact and near-miss clones at different levels of similarity. Applying statistical methods we investigated, from different dimensions, the evolution of both exact and nearmiss clones, and also forecasted the amount of clones in future releases of the software systems. Our study offers significant insights into the existence and evolution of code clones and their relationships with programming language or paradigm and program size.

  • Conflict-aware Optimal Scheduling of Code Clone Refactoring: A Constraint Programming Approach

    19th IEEE International Conference on Program Comprehension (ICPC 2011) - student symposium

    Duplicated code, also known as code clones, are one of the malicious ‘code smells’ that often need to be removed through refactoring for enhancing maintainability. Among all the potential refactoring opportunities, the choice and order of a set of refactoring activities may have distinguishable effect on the design/code quality. Moreover, there may be dependencies and conflicts among those refactorings. The organization may also impose priorities on certain refactoring activities. Addressing all these conflicts, priorities, and dependencies, manual formulation of an optimal refactoring schedule is very expensive, if not impossible. Therefore, an automated refactoring scheduler is necessary, which will maximize benefit and minimize refactoring effort. In this paper, we present a refactoring effort model, and propose a constraint programming approach for conflict-aware optimal scheduling of code clone refactoring.

  • Evaluating Code Clone Genealogies at Release level: An Empirical Study

    10th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2010)

    Code clone genealogies show how clone groups evolve with the evolution of the associated software system, and thus could provide important insights on the maintenance implications of clones. In this paper, we provide an in-depth empirical study for evaluating clone genealogies in evolving open source systems at the release level. We develop a clone genealogy extractor, examine 17 open source C, Java, C++ and C# systems of diverse varieties and study different dimensions of how clone groups evolve with the evolution of the software systems. Our study shows that majority of the clone groups of the clone genealogies either propagate without any syntactic changes or change consistently in the subsequent releases, and that many of the genealogies remain alive during the evolution. These findings seem to be consistent with the findings of a previous study that clones may not be as detrimental in software maintenance as believed to be (at least by many of us), and that instead of aggressively refactoring clones, we should possibly focus on tracking and managing clones during the evolution of software systems.

  • Useful, but usable? Factors Affecting the Usability of APIs

    18th IEEE Working Conference on Reverse Engineering (WCRE 2011)

    Software development today has been largely dependent on the use of API libraries, frameworks, and reusable components. However, the API usability issues often increase the development cost (e.g., time, effort) and lower code quality. In this regard, we study 1,513 bug-posts across five different bug repositories, using both qualitative and quantitative analysis. We identify the API usability issues that are reflected in the bugposts from the API users, and distinguish relative significance of the usability factors. Moreover, from the lessons learned by manual investigation of the bug-posts, we provide further insight into the most frequent API usability issues.

  • A Constraint Programming Approach to Conflict-aware Optimal Scheduling of Prioritized Code Clone Refactoring

    11th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2011)

    Duplicated code, also known as code clones, are one of the malicious ‘code smells’ that often need to be removed through refactoring for enhancing maintainability. Among all the potential refactoring opportunities, the choice and order of a set of refactoring activities may have distinguishable effect on the design/code quality. Moreover, there may be dependencies and conflicts among those refactorings. The organization may also impose priorities on certain refactoring activities. Addressing all these conflicts, priorities, and dependencies, manual formulation of an optimal refactoring schedule is very expensive, if not impossible. Therefore, an automated refactoring scheduler is necessary, which will maximize benefit and minimize refactoring effort. In this paper, we present a refactoring effort model, and propose a constraint programming approach for conflict-aware optimal scheduling of code clone refactoring.

  • Towards Flexible Code Clone Detection, Management, and Refactoring in IDE

    5th International Workshop on Software Clones (IWSC 2011)

    In this paper, we propose an IDE-based clone management system to flexibly detect, manage, and refactor both exact and near-miss code clones. Using a k-difference hybrid suffix tree algorithm we can efficiently detect both exact and near-miss clones. We have implemented the algorithm as a plugin to the Eclipse IDE, and have been extending this for real-time code clone management with semi-automated refactoring support during the actual development process.

  • Analyzing and Forecasting Near-miss Clones in Evolving Software: An Empirical Study

    16th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS 2011)

    Effort for development and maintenance of complex large software is believed to have dependency on the amount of duplicated code fragments (code clones) present in codebases. For example, clones need to be carefully and consistently maintained and/or refactored for preventing accidental error propagation. Thus it is important to understand the proportion and evolution of clones in evolving software systems for cost estimation or the like. This paper presents a study on the evolution of near-miss clones at release level in medium to large open source software systems of different types (operating systems, database systems, editors, etc.) written in three different programming languages namely C, C#, and Java. Using a hybrid clone detector, NiCad, we detected both exact and near-miss clones at different levels of similarity. Applying statistical methods we investigated, from different dimensions, the evolution of both exact and nearmiss clones, and also forecasted the amount of clones in future releases of the software systems. Our study offers significant insights into the existence and evolution of code clones and their relationships with programming language or paradigm and program size.

  • Conflict-aware Optimal Scheduling of Code Clone Refactoring: A Constraint Programming Approach

    19th IEEE International Conference on Program Comprehension (ICPC 2011) - student symposium

    Duplicated code, also known as code clones, are one of the malicious ‘code smells’ that often need to be removed through refactoring for enhancing maintainability. Among all the potential refactoring opportunities, the choice and order of a set of refactoring activities may have distinguishable effect on the design/code quality. Moreover, there may be dependencies and conflicts among those refactorings. The organization may also impose priorities on certain refactoring activities. Addressing all these conflicts, priorities, and dependencies, manual formulation of an optimal refactoring schedule is very expensive, if not impossible. Therefore, an automated refactoring scheduler is necessary, which will maximize benefit and minimize refactoring effort. In this paper, we present a refactoring effort model, and propose a constraint programming approach for conflict-aware optimal scheduling of code clone refactoring.

  • Evaluating Code Clone Genealogies at Release level: An Empirical Study

    10th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2010)

    Code clone genealogies show how clone groups evolve with the evolution of the associated software system, and thus could provide important insights on the maintenance implications of clones. In this paper, we provide an in-depth empirical study for evaluating clone genealogies in evolving open source systems at the release level. We develop a clone genealogy extractor, examine 17 open source C, Java, C++ and C# systems of diverse varieties and study different dimensions of how clone groups evolve with the evolution of the software systems. Our study shows that majority of the clone groups of the clone genealogies either propagate without any syntactic changes or change consistently in the subsequent releases, and that many of the genealogies remain alive during the evolution. These findings seem to be consistent with the findings of a previous study that clones may not be as detrimental in software maintenance as believed to be (at least by many of us), and that instead of aggressively refactoring clones, we should possibly focus on tracking and managing clones during the evolution of software systems.

  • Useful, but usable? Factors Affecting the Usability of APIs

    18th IEEE Working Conference on Reverse Engineering (WCRE 2011)

    Software development today has been largely dependent on the use of API libraries, frameworks, and reusable components. However, the API usability issues often increase the development cost (e.g., time, effort) and lower code quality. In this regard, we study 1,513 bug-posts across five different bug repositories, using both qualitative and quantitative analysis. We identify the API usability issues that are reflected in the bugposts from the API users, and distinguish relative significance of the usability factors. Moreover, from the lessons learned by manual investigation of the bug-posts, we provide further insight into the most frequent API usability issues.

  • A Constraint Programming Approach to Conflict-aware Optimal Scheduling of Prioritized Code Clone Refactoring

    11th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2011)

    Duplicated code, also known as code clones, are one of the malicious ‘code smells’ that often need to be removed through refactoring for enhancing maintainability. Among all the potential refactoring opportunities, the choice and order of a set of refactoring activities may have distinguishable effect on the design/code quality. Moreover, there may be dependencies and conflicts among those refactorings. The organization may also impose priorities on certain refactoring activities. Addressing all these conflicts, priorities, and dependencies, manual formulation of an optimal refactoring schedule is very expensive, if not impossible. Therefore, an automated refactoring scheduler is necessary, which will maximize benefit and minimize refactoring effort. In this paper, we present a refactoring effort model, and propose a constraint programming approach for conflict-aware optimal scheduling of code clone refactoring.

  • Towards Flexible Code Clone Detection, Management, and Refactoring in IDE

    5th International Workshop on Software Clones (IWSC 2011)

    In this paper, we propose an IDE-based clone management system to flexibly detect, manage, and refactor both exact and near-miss code clones. Using a k-difference hybrid suffix tree algorithm we can efficiently detect both exact and near-miss clones. We have implemented the algorithm as a plugin to the Eclipse IDE, and have been extending this for real-time code clone management with semi-automated refactoring support during the actual development process.

  • The Road to Software Clone Management: A Survey

    Technical Report 2012-03, Department of Computer Science, The University of Saskatchewan

    Duplicated code or code clones are a kind of code smell that have both positive and negative impact on the development and maintenance of software systems. Software clone research in the past mostly focused on the detection and analysis of code clones, while the research in recent years indicates that clone manage- ment is taking the center stage due to its pragmatic importance. Over more than a decade of research on software clones, notably three surveys appeared in the literature, which cover the detection, analysis, and evolutionary characteristics of code clones. This paper presents a comprehensive survey on the state of the art in clone management, with in-depth investigation of clone management activities (e.g., tracing, refactor- ing, cost-benefit analysis) beyond the detection and analysis. This is the first survey on clone management, where we point to the achievements so far, and reveal avenues for further research necessary towards an integrated clone management system.

  • Analyzing and Forecasting Near-miss Clones in Evolving Software: An Empirical Study

    16th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS 2011)

    Effort for development and maintenance of complex large software is believed to have dependency on the amount of duplicated code fragments (code clones) present in codebases. For example, clones need to be carefully and consistently maintained and/or refactored for preventing accidental error propagation. Thus it is important to understand the proportion and evolution of clones in evolving software systems for cost estimation or the like. This paper presents a study on the evolution of near-miss clones at release level in medium to large open source software systems of different types (operating systems, database systems, editors, etc.) written in three different programming languages namely C, C#, and Java. Using a hybrid clone detector, NiCad, we detected both exact and near-miss clones at different levels of similarity. Applying statistical methods we investigated, from different dimensions, the evolution of both exact and nearmiss clones, and also forecasted the amount of clones in future releases of the software systems. Our study offers significant insights into the existence and evolution of code clones and their relationships with programming language or paradigm and program size.

  • Conflict-aware Optimal Scheduling of Code Clone Refactoring: A Constraint Programming Approach

    19th IEEE International Conference on Program Comprehension (ICPC 2011) - student symposium

    Duplicated code, also known as code clones, are one of the malicious ‘code smells’ that often need to be removed through refactoring for enhancing maintainability. Among all the potential refactoring opportunities, the choice and order of a set of refactoring activities may have distinguishable effect on the design/code quality. Moreover, there may be dependencies and conflicts among those refactorings. The organization may also impose priorities on certain refactoring activities. Addressing all these conflicts, priorities, and dependencies, manual formulation of an optimal refactoring schedule is very expensive, if not impossible. Therefore, an automated refactoring scheduler is necessary, which will maximize benefit and minimize refactoring effort. In this paper, we present a refactoring effort model, and propose a constraint programming approach for conflict-aware optimal scheduling of code clone refactoring.

  • Evaluating Code Clone Genealogies at Release level: An Empirical Study

    10th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2010)

    Code clone genealogies show how clone groups evolve with the evolution of the associated software system, and thus could provide important insights on the maintenance implications of clones. In this paper, we provide an in-depth empirical study for evaluating clone genealogies in evolving open source systems at the release level. We develop a clone genealogy extractor, examine 17 open source C, Java, C++ and C# systems of diverse varieties and study different dimensions of how clone groups evolve with the evolution of the software systems. Our study shows that majority of the clone groups of the clone genealogies either propagate without any syntactic changes or change consistently in the subsequent releases, and that many of the genealogies remain alive during the evolution. These findings seem to be consistent with the findings of a previous study that clones may not be as detrimental in software maintenance as believed to be (at least by many of us), and that instead of aggressively refactoring clones, we should possibly focus on tracking and managing clones during the evolution of software systems.

  • Useful, but usable? Factors Affecting the Usability of APIs

    18th IEEE Working Conference on Reverse Engineering (WCRE 2011)

    Software development today has been largely dependent on the use of API libraries, frameworks, and reusable components. However, the API usability issues often increase the development cost (e.g., time, effort) and lower code quality. In this regard, we study 1,513 bug-posts across five different bug repositories, using both qualitative and quantitative analysis. We identify the API usability issues that are reflected in the bugposts from the API users, and distinguish relative significance of the usability factors. Moreover, from the lessons learned by manual investigation of the bug-posts, we provide further insight into the most frequent API usability issues.

  • A Constraint Programming Approach to Conflict-aware Optimal Scheduling of Prioritized Code Clone Refactoring

    11th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2011)

    Duplicated code, also known as code clones, are one of the malicious ‘code smells’ that often need to be removed through refactoring for enhancing maintainability. Among all the potential refactoring opportunities, the choice and order of a set of refactoring activities may have distinguishable effect on the design/code quality. Moreover, there may be dependencies and conflicts among those refactorings. The organization may also impose priorities on certain refactoring activities. Addressing all these conflicts, priorities, and dependencies, manual formulation of an optimal refactoring schedule is very expensive, if not impossible. Therefore, an automated refactoring scheduler is necessary, which will maximize benefit and minimize refactoring effort. In this paper, we present a refactoring effort model, and propose a constraint programming approach for conflict-aware optimal scheduling of code clone refactoring.

  • Towards Flexible Code Clone Detection, Management, and Refactoring in IDE

    5th International Workshop on Software Clones (IWSC 2011)

    In this paper, we propose an IDE-based clone management system to flexibly detect, manage, and refactor both exact and near-miss code clones. Using a k-difference hybrid suffix tree algorithm we can efficiently detect both exact and near-miss clones. We have implemented the algorithm as a plugin to the Eclipse IDE, and have been extending this for real-time code clone management with semi-automated refactoring support during the actual development process.

  • The Road to Software Clone Management: A Survey

    Technical Report 2012-03, Department of Computer Science, The University of Saskatchewan

    Duplicated code or code clones are a kind of code smell that have both positive and negative impact on the development and maintenance of software systems. Software clone research in the past mostly focused on the detection and analysis of code clones, while the research in recent years indicates that clone manage- ment is taking the center stage due to its pragmatic importance. Over more than a decade of research on software clones, notably three surveys appeared in the literature, which cover the detection, analysis, and evolutionary characteristics of code clones. This paper presents a comprehensive survey on the state of the art in clone management, with in-depth investigation of clone management activities (e.g., tracing, refactor- ing, cost-benefit analysis) beyond the detection and analysis. This is the first survey on clone management, where we point to the achievements so far, and reveal avenues for further research necessary towards an integrated clone management system.

  • IDE-based Real-time Focused Search for Near-miss Clones

    27th ACM Symposium On Applied Computing (ACM SAC 2012, Software Engineering Track)

    Code clone is a well-known code smell that needs to be detected and managed during the software development process. However, the existing clone detectors have one or more of the three shortcomings: (a) limitation in detecting Type- 3 clones, (b) they come as stand-alone tools separate from IDE and thus cannot support clone-aware development, (c) they overwhelm the developer with all clones from the entire code-base, instead of a focused search for clones of a selected code segment of the developer's interest. This paper presents our IDE-integrated clone search tool, that addresses all the above issues. For clone detection, we adapt a su?x-tree-based hybrid algorithm. Through an asymptotic analysis, we show that our approach for clone detection is both time and memory e?cient. Moreover, using three separate empirical studies, we demonstrate that our tool is flexibly usable for searching exact (Type-1 ) and near-miss (Type-2 and Type-3 ) clones with high precision and recall.

CMPT 370

2.5(9)