How to use this toolkit

The toolkit is structured to guide program managers and policy officers through completing two key documents: the Evaluation overview and the Evaluation work plan. The key underlying resources are the Manager’s guide to evaluation from the BetterEvaluation website and the NSW Evaluation Toolkit.

There are many different approaches to evaluation and this toolkit is not intended to be exhaustive. It will be updated regularly in response to user feedback. If you would like to suggest a change or if you need further information on how to use the toolkit, please email


The Territory Government is focused on improving evidence-based decision-making as part of A plan for budget repair. This will be supported by a whole of government approach to evaluation in the Territory to help drive a culture of continuous improvement across government.

Program evaluation aims to improve government services to achieve better outcomes for Territorians. This is important because sometimes programs don’t work in reality and may even cause unintended harm. Program managers need to know whether their programs are helping people and whether implementing changes could help more people within the same budget. Without evaluation, there is a risk that poorly performing programs continue without change, slowing progress towards achieving desired outcomes and potentially wasting taxpayers’ money.


Scared Straight and other similar programs involve organised visits to prison by juvenile delinquents or children at risk for criminal behaviour. Programs are designed to deter participants from future offending through firsthand observation of prison life and interaction with adult inmates. A recent Cochrane review found that these programs fail to deter crime and actually lead to more offending behaviour. See Petrosino et. al., 2013, ‘Scared Straight and other juvenile awareness programs for preventing juvenile delinquency’, accessed October 2020.

When an evaluation shows a program is not working well, managers can use the evaluation findings to improve the program by either modifying the existing program or taking a new approach. Each evaluation is an opportunity to learn by either demonstrating what works well or what does not. Over time, evaluations build an evidence base of what works in the Territory and foster a culture of continuous improvement.

Central oversight is critical to developing a strategic whole of government approach to evaluation and strengthening evaluation culture.[1] A centralised approach to program evaluation supports:

  • a consistent standard of evaluation across agencies
  • an ability to identify systemic issues across government
  • capacity to set strategic priorities for and identify gaps in evaluation
  • accountability for multi agency and whole of government programs
  • coordinated capability building, resourcing, data collection, reporting and evaluative effort
  • a centralised repository of evaluations to enhance continuous learning and quality improvement.

Under the Territory’s approach, evaluation activity will continue to be undertaken primarily by the agency delivering the program (this may include using external experts commissioned by the agency). This is necessary to maintain a close link between the evaluation and the program area with relevant subject matter knowledge and experience.

Evaluation activity will be overseen, coordinated and supported by the PEU within the Department of Treasury and Finance (DTF), supported by the Department of the Chief Minister and Cabinet (CMC), the Office of the Commissioner for Public Employment (OCPE) and the Department of Corporate and Digital Development (DCDD).

Program evaluation framework

The Program evaluation framework integrates evaluation into the government’s policy and budget development processes. The framework aims to improve transparency and accountability, and encourage better use of Territory Government funds by:

  • ensuring new programs and extensions to existing programs have identified goals and objectives that are achievable and measurable, or include actions to develop measurement as part of the program
  • ensuring new programs and extensions to existing programs have an evaluation strategy
  • applying sunset provisions to new programs (or extensions to existing programs), where the decision for further funding is informed by evaluation outcomes
  • establishing a rolling schedule of evaluations to ensure existing programs are evaluated over time
  • providing a clear mandate for agencies to evaluate their programs and target their investments
  • outlining expected evaluation principles and standards
  • providing government with clear advice about the costs and benefits of evaluation (including data collection and analysis) to help inform evaluation decisions
  • establishing a protocol for policy and program officers to plan for evaluation across the program lifecycle (with a step-by-step guide in the program evaluation toolkit)
  • establishing a tiered system of evaluations to ensure evaluation is proportionate to the cost, risk and complexity of a program
  • describing how the Territory Government can build evaluation capability within the Northern Territory Public Sector and foster a culture of continuous improvement
  • outlining how the Territory Government will measure progress in implementing the framework.

Territory Government agencies must use the framework and toolkit to help plan, commission and use evaluations. The framework and toolkit may also provide useful guidance for Territory Government service delivery partners and external evaluators of Territory Government programs.

The Program evaluation framework is underpinned by 10 best practice evaluation principles:[2]

  1. Build evaluation into program design – plan the evaluation as part of program design to ensure clearly defined objectives and measurable outcomes prior to commencement.
  2. Base the evaluation on sound methodology – adopt a best practice evaluation methodology that is commensurate with the program’s size, significance and risk.
  3. Allocate resources and time to evaluate – include provision for the required evaluation resources and timeframes when planning and budgeting for a program. Ensure evaluation findings are available when needed to support key decision points.
  4. Use the right mix of expertise and independence – use evaluators who are experienced and independent from program managers, but include program managers in evaluation planning.
  5. Ensure robust governance and oversight – establish governance processes to ensure programs are designed and evaluated in accordance with this framework, including meeting reporting requirements.
  6. Be ethical in design and conduct – carefully consider the ethical implications of any evaluation activity, particularly collecting and using personal data, and any potential impacts on vulnerable groups.[3]
  7. Be informed and guided by relevant stakeholders – listen to stakeholders, including program participants, government and non-government staff involved in managing and delivering the program, and senior decision makers.
  8. Consider and use evaluation data meaningfully – include clear statements of findings, recommendations or key messages for consideration in evaluation reports. Use reports to inform decisions about program changes.
  9. Be transparent and open to scrutiny – disseminate key information to relevant stakeholders, including methodologies, assumptions, analyses and findings.
  10. Promote equity and inclusivity – harness the perspectives of vulnerable groups during evaluations, to enable fair and socially just outcomes.

Treasurer’s Direction on Performance and Accountability

Treasurer’s Directions are mandatory requirements that specify the practices and procedures that must be observed by Accountable Officers in the financial management of their agencies (Financial Management Act 1995). DTF is currently developing a Performance and Accountability Treasurer’s Direction that will set out the minimum requirements for all Territory Government agencies for:

  • planning objectives and actions
  • managing or delivering services
  • performance reporting
  • reviewing and evaluating outcomes.

Guidance on performance and accountability will be provided to agencies to assist them in complying with the requirements of the Performance Accountability Treasurer’s Direction.

Program evaluation, organisational reviews and audits – what is the difference?

For the purposes of the Program evaluation framework:

Evaluation is:

“A systematic and objective process to make judgements about the merit or worth of one or more programs, usually in relation to their effectiveness, efficiency and appropriateness.”[4] This definition applies more to outcome and impact evaluations rather than process evaluations that tend to focus more on monitoring.

Monitoring is:

“A management process to periodically report against planned targets or key performance indicators that, for the most part, is not concerned with questions about the purpose, merit or relevance of the program.”[4]

While there are a number of different approaches to evaluation,[5] the Program evaluation framework is based on three types[6], linked to a program’s lifecycle:

  1. Process evaluation – considers program design and initial implementation.
  2. Outcomes evaluation – considers program implementation and short to medium term outcomes.
  3. Impact evaluation – considers medium to long term outcomes, whether the program contributed to the outcomes and represented value for money.

The different types of evaluation are covered in more detail in section 2.5.3 Types of evaluation.

Program evaluation is most effective when it is complemented by other activities which collect information and assess performance including:

  • organisational reviews – consider an agency’s entire budget, ensure expenditure is aligned to government priorities and services are being provided efficiently. Implementation of a rolling schedule of organisational reviews was a recommendation in A plan for budget repair. DTF is currently developing an Agency Organisational Review Framework.
  • program reviews – typically quick, operational assessments of a program to inform continuous improvement[7].
  • research – closely related to evaluation, but can ask different types of questions that may not be related to the merit or worth of a program[7].
  • external audits – undertaken by an independent auditor. Reviews records supporting financial statements[8].
  • internal audits – undertaken by agencies. Reviews governance, risk management, and control process according to a risk-based need[8].
  • performance Management System (PMS) Audits – undertaken by the Northern Territory Auditor-General. Consider whether appropriate systems exist and are effective in enabling agencies to manage their outputs[9].
  • performance Audit – undertaken by an Auditor-General but not currently within the scope of the Northern Territory Auditor-General. Examine the economy, efficiency and effectiveness of government programs and organisations.[10], [11]
  • Independent Commissioner Against Corruption (ICAC) audit or review – investigate practices, policies or procedures of a public body or public officer to identify whether improper conduct has occurred, is occurring or is at risk of occurring.

Further definitions are in the Glossary.

[1] Evaluation and learning from failure and success, ANZSOG, 2019.

[2] Adapted from the NSW Government Program Evaluation Guidelines.

[3] In some circumstances, formal review and approval from an ethics committee certified by the National Health and Medical Research Council may be required. See Ethical considerations for further information.

[4] NSW Government Evaluation Framework August 2013.

[5] For information about other evaluation types, please see the BetterEvaluation website.

[6] Further information on these three evaluation types are in sections 3.2.1 to 3.2.3.

[7] NSW Evaluation toolkit.

[8] The Institute of Internal Auditors Australia, Internal Audit Essentials, 2018.

[9] Northern Territory Auditor-General’s Office Annual Report 2017-18.

[10] Black, M., 2018, Strategic Review of the Northern Territory Auditor-General’s Office.

[11] A Guide to Conducting Performance Audits, Australian National Audit Office, 2017.

Evaluation as part of the program cycle

For the purposes of the program evaluation framework, a program is broadly defined as: “A set of activities managed together over a sustained period of time that aim to deliver an outcome for a client or client group”.[1] Essentially, programs deliver government functions.[2] A program can also include related government spending on a single intended outcome.[2] A program can be as broad as all government expenditure to reduce cost of living pressures, or as specific as a single social concession.[2] Education, health and policing services are deemed functions, not programs. Further, as a general rule, the Program evaluation framework will not apply to infrastructure and information and communications technology projects (which are covered by separate review processes) or externally funded programs.

The term ‘program’ is sometimes used interchangeably with project, service, initiative, strategy or policy. In practice, programs vary in size, duration and structure, and may span multiple agencies. Whole of government programs can be large and significant strategies, action plans or frameworks that encompass multiple agencies and locations, and comprise many agency-level programs, sub-programs and projects (Figure 1). Regardless of program size, when designed and conducted well, evaluation can yield useful evidence about the effectiveness of programs.

A strategic approach to evaluation (see Evaluating strategically) includes evaluations at several levels. For example, evaluating at the whole of government level to identify how different components of a strategy work together to achieve outcomes, and evaluating at the project level to examine specific aspects of a program.

Figure 1: Program hierarchy

Inverted pyramid image

Integrating evaluation into the program lifecycle ensures cost-effective evaluation is delivered in time to support key decision making points (Figure 2). Planning for evaluation should start at the program design stage so all stakeholders understand the key performance indicators the program will be assessed against and how and when evaluation will occur. Early planning also ensures data requirements are identified prior to commencement and lessons learned from previous evaluations can be used effectively.

Figure 2: Integrating evaluation into the program lifecycle

Program cycle diagram

Further guidance on good practice policy development is available on the APS Policy Hub.

Evaluation as part of budget development

Integrating evaluation into the budget process allows governments to make better use of resources.[3] The program evaluation framework integrates evaluation into the Territory Government’s budget process through an evaluation overview as part of the Cabinet submission template,[4] an evaluation work plan for approved programs, sunset clauses and a rolling schedule of evaluations.

Evaluation overview

An evaluation overview is required as part of the Cabinet submission process for programs requesting funding of $1 million or more in a year. The overview should be a concise summary of the key outcomes the program is trying to achieve and how success will be measured (further info at section 1 Complete the evaluation overview).

A full evaluation work plan will be required within six months if the program is approved to proceed.

Evaluation work plan

The detailed evaluation work plan[5] outlines future evaluation activity for a particular program over the next five years. The template requires agencies to consider:

  • the program’s theory of change (the program logic model)
  • key evaluation questions, indicators, and data sources (the question bank and data matrix)
  • appropriate types and timing of future evaluations (combined with the logic model and data matrix to form the program’s evaluation work plan).

See section 2. Complete the evaluation work plan for further guidance on completing the evaluation work plan.

Sunset clauses

Programs subject to sunset clauses are funded for a finite period, with the decision for further funding (either wholly or in part) informed by an evaluation.

Currently, agencies’ annual recurrent budgets represent an accumulation of funding decisions by a variety of governments over time. Without program evaluation, the Budget Review Subcommittee and Cabinet have little visibility of:

  • the effectiveness of existing programs
  • the applicability of existing programs to the current policy context
  • whether delivery models for the existing program provide the best value for money.[2]

A sunset clause is a built-in decision point for government. Unless otherwise directed by Cabinet, funding for new programs (or extensions of existing programs) that impact the Territory Government’s operating balance by $1 million or more in a year will be subject to an initial five-year sunset clause. This ensures that ongoing funding for programs is informed by evaluation.

Sunset clauses have been included in the Cabinet template and handbook and will be supported by a:

  • Sunset clause guide for agencies (in development)
  • Sunset clause guide for Treasury analysts (in development).

[1] NSW Government Evaluation Framework August 2013.

[2] Program Evaluation: Sunset Clauses, Agency Guide, Government of Western Australia, 2014.

[3] Building Better Policies, Chapter 6 Monitoring and Evaluation Systems and the Budget, World Bank, 2012.

[4] In the absence of exceptional circumstances, all submissions seeking additional funding must be considered as part of the Budget development process under the Northern Territory Government’s Charter of Budget Discipline.

[5] The Evaluation work plan template has been adapted from the Evaluation strategy template with permission from the Evaluation Office at the Commonwealth Department of Industry, Science, Energy and Resources. Publicly available at the APS Policy Hub.

Monitoring and evaluation requires the commitment of resources. If an evaluation does not provide decision-makers with meaningful information it reduces resources available for program implementation.[1] Therefore, it is necessary to balance the cost of evaluation and the risk of not evaluating, noting that sometimes monitoring will be sufficient. While outcome and impact evaluations are important, well-designed data collection, program monitoring and process evaluations can help refine programs over time with minimal cost.[2]

Agencies and program managers will need to take a strategic approach in determining appropriate evaluation scopes, designs and resourcing requirements. For some programs, evaluation could simply involve routine assessment of activities and outputs built into program reporting, while for others evaluation will need to be more comprehensive and assess whether the program is appropriate, effective and efficient.[3]

Although it is not feasible, cost effective or appropriate to fully evaluate all Territory Government programs, some level of monitoring and review should be considered for all programs.

For whole of government programs, or programs with multiple components, it may be necessary to evaluate components separately as well as collectively, considering questions such as:

  • which program initiatives are providing the greatest impact
  • which elements of program delivery are most effective in generating desired outcomes
  • is greater impact achieved when specific strategies are combined into a package of initiatives
  • in what contexts are mechanisms of change triggered to achieve desired outcomes?[3]

Evaluations should aim to achieve the highest rigour for the lowest cost by:

  • incorporating evaluation planning at the initial program design stage
  • collecting the required data for monitoring and evaluation throughout program implementation and aligning this to existing data collections where possible
  • using a tiered approach that prioritises evaluative effort (see Prioritising evaluations for rolling schedule for further information).

Developing the annual program master list

To balance evaluative effort against the potential benefit, agencies need to review their existing stock of programs and prioritise evaluations.

A good practice starting point is a program master list, which is designed to capture all current Territory Government-funded programs to help prioritise evaluations. It identifies the extent to which existing programs have been evaluated and the proposed timing of any future evaluation.[2, 4] Agencies are asked to complete the Program master list template annually as part of the Budget development process. Integrating the program master list into the Budget development process ensures new Cabinet submissions are considered within the context of existing government programs and the available evidence base from completed evaluations.

In 2016, the New South Wales Auditor-General undertook a performance audit of the NSW Government’ program evaluation initiative. The audit set out the good practice model expected from each agency to prepare an evaluation schedule including a master list of all current agency programs with their tier ranking and linkage to government priorities. NSW Auditor-General’s Report to Parliament, Implementation of the NSW Government’s program evaluation initiative, 2016, accessed October 2020.

Agency activities that are not captured as part of the program master list will still be scrutinised as part of broader organisational reviews.

Prioritising evaluations for rolling schedule

To help manage and prioritise evaluations, agencies are required to prepare multi-year rolling evaluation schedules that are reviewed annually by the Budget Review Subcommittee of Cabinet. In addition to evaluating new programs in accordance with the approved evaluation overview, the schedules will be expected to include a list of existing programs planned for evaluation, including the tier and expected evaluation timeframe.

Evaluating existing programs can be complex and expensive, particularly where the data required to answer basic evaluation questions has not been collected. The section ‘Getting existing programs ready for evaluation’ has further guidance.

The evaluation schedule for each agency should be aligned to agency corporate planning cycles and internal decision-making processes and should be developed in consultation with DTF and CMC.

A whole of government evaluation schedule will be compiled by DTF and submitted to the Budget Review Subcommittee of Cabinet for approval along with an annual summary of evaluation findings for the previous year. An example of an evaluation schedule is available from the Commonwealth Department of Industry, Innovation and Science.

Table 1 provides a guide to prioritising programs for the rolling schedule of evaluation. A best-fit approach should be used to categorise programs (that is, a program does not need to satisfy every characteristic to fall into a particular tier).

Table 1. A guide to program tiers, evaluation types and timing[5]
Evaluation type
Tier Characteristics of program 1 year 2 years 3–5 years

Priority: strategic priority for government

Program accountability: Cabinet or Cabinet subcommittee

Funding: significant government/agency funding

Risk: high risk (either to government or the community)

Scope: multiple government agencies and/or multiple external delivery partners

Other factors: lack of evidence base, major external reporting requirements (for example, Commonwealth), innovative approach

Process Outcomes Impact

Priority: strategic priority for agency

Program accountability: portfolio Minister(s)

Funding: significant agency funding

Risk: moderate to high risk

Scope: multiple government agencies and/or external delivery partners

Other factors: lack of evidence base, internal reporting and evaluation requirement

Process Outcomes Impact or outcomes

Priority: named in department agency strategic plan

Program accountability: agency chief executive

Funding: moderate agency funding

Risk: low to moderate

Scope: responsibility of single agency, may involve external delivery partners

Other factors: limited evidence base, internal reporting and evaluation requirement

Process   Outcomes

Priority: low or emerging strategic priority for agency

Program accountability: business unit within agency

Funding: limited agency funding

Risk: low

Scope: single agency, may involve external delivery partners

Other factors: local delivery similar to other successful programs

Process   Process

The appropriate evaluation types and timing will need to be determined on a case-by-case basis to ensure the overall evaluation approach is fit-for-purpose.

When prioritising evaluations, agencies should give priority to:

  • tier 3 and tier 4 programs (as per the program tiering in Table 1)
  • programs that have not previously been evaluated
  • programs for which evaluation is required by Cabinet (for example, in line with an evaluation overview approved by Cabinet).

Tier 3 and Tier 4 programs should be prioritised for evaluation and would usually be expected to go through process, outcome and/or impact evaluations over the program lifecycle.

The prioritisation of Tier 1 and 2 programs is at the discretion of agencies but should be influenced by how they fit into higher tier programs (if applicable). In particular, agencies should consider evaluating small programs if they will be used to inform decisions about whether to roll out the program to a wider area and/or client group (such as a pilot or a trial) or will be used as evidence of another program’s effectiveness.[3]

Getting existing programs ready for evaluation

The Program evaluation framework emphasises the importance of planning for evaluation and data capture at the program design stage. However, existing programs without an evaluation work plan should also be periodically reviewed because:

  • the bulk of government spending relates to legacy programs
  • the nature and outcomes of these programs may have evolved or drifted away from their initial rationale or purpose over time
  • legacy programs have the potential to become embedded or institutionalised by the participants or community in ways that may have significantly affected their outcomes.

Evaluating programs that were not designed with evaluation in mind can be complex and expensive.[6] Completing an evaluation work plan (see section 2. Complete the evaluation work plan) can assist to get programs ‘evaluation ready’. An important first step is clarifying what the program aims to achieve and how it tries to achieve this by developing a program logic (further information in section 2.5.1. Program logic).[7] Developing a program logic for an existing program can be an uncomfortable process. Stakeholders may disagree about how a program works or even what it is aiming to achieve. The program logic may reveal that the program is not well formulated or that it includes dubious assumptions. To genuinely support learning and improvement, developing a program logic should not try to rationalise past program decisions. Instead, developing a program logic for an existing program should be an opportunity to question, debate and learn. This process can help agencies identify unnecessary activities and make space for more important ones.[7]

The program stage will also have implications for the evaluation design, see Table 2 from the Better Evaluation website .[8]

Table 2. Evaluation design at different program stages
Stage of program developmentConsequencePossible implication for the evaluation design
Not yet startedCan set up data collection from the beginning of implementationPossible to gather baseline data as a point of comparison and also to establish comparison or control groups from the beginning
  Opportunity to build some data collection into administrative systems to reduce costs and increase coverage
 Period of data collection will be longNeed to develop robust data collection systems including quality control and storage
Part way through implementationCannot get baseline data unless this has already been set upWill need to construct retrospective baseline data to estimate changes that have occurred
 Might be able to identify ‘bright spots’ where there seems to be more success and those with less successScope to do purposeful sampling and learn from particular successes and also cases that have failed to make much progress
Almost completedCannot get baseline data unless this has already been set upWill need to construct retrospective baseline data to estimate changes that have occurred
 Depending on timeframes, some outcomes and impacts might already be evidencedOpportunity to gather evidence of outcomes and impacts
CompletedCannot get baseline data unless this has already been set upWill need to construct retrospective baseline data to estimate changes that have occurred
 Depending on timeframes, some outcomes and impacts might already be evidencedOpportunity to gather evidence of outcomes and impacts
 Cannot directly observe implementationWill depend on existing data or retrospective recollections about implementation

[1] M. K. Gugerty, D. Karlan, The Goldilocks Challenge: Right Fit Evidence for the Social Sector, New York, Oxford University Press, 2018.

[2] APS Review, Evaluation and learning from failure and success.

[3] Queensland Government Program Evaluation Guidelines.

[4] 2016 NSW Auditor-General’s Report to Parliament, Implementation of the NSW Government’s program evaluation initiative.

[5] Adapted from DIIS Evaluation Strategy 2017-2021 and NSW guidelines and WA guidelines.

[6] C. Althaus, P. Bridgman, G. Davis, The Australian Policy Handbook: A practical guide to the policy making process, 6th edition, Sydney, Allen and Unwin, 2018

[7] M. K. Gugerty, D. Karlan, The Goldilocks Challenge: Right Fit Evidence for the Social Sector, New York, Oxford University Press, 2018

[8] BetterEvaluation

This toolkit is divided into six steps for planning and commissioning an evaluation. Table 3 shows how the numbered sections in the toolkit match these steps.

The evaluation planning that occurs in steps 1 and 2 starts during program design and may take a substantial amount of time. Sections 1 and 2 of the toolkit are structured to mirror the Evaluation overview template and the Evaluation work plan template to give step by step guidance through the templates.

Table 3. Stages for planning and commissioning an evaluation
Program design1Complete the evaluation overviewHow to concisely summarise what the program is aiming to achieve, how it will achieve this, external factors that may affect success, how the program’s success will be measured and what evaluations will be required (and when) and identify other evaluation-related resource requirements.
Before the evaluation2Complete the evaluation work plan

How to:

  • decide the appropriate evaluation methodology including program logic, key evaluation questions, data matrix and ethical considerations
  • consider implementation including roles, responsibilities and resourcing
  • identify risks.
 3Engage the evaluation teamHow to select the right evaluation team and apply the procurement governance policy.
During the evaluation4Manage the implementation of the evaluationWhat the role of the evaluation manager is in overseeing the implementation of the evaluation work plan.
 5Guide production of a quality evaluation reportHow to structure an evaluation report, including succinct reporting of the evaluation findings. This step also outlines the minimum requirements of an evaluation report.
After the evaluation6Disseminate results and support use of evaluationHow to appropriately communicate evaluation results and respond to recommendations.

What’s the difference between an evaluation overview and an evaluation work plan?

An evaluation overview briefly summarises a program’s evaluation requirements as part of the Cabinet submission process (see section 1. Complete the evaluation overview). If the program is approved to proceed, an evaluation workplan details the evaluation requirements including a full program logic, evaluation questions and data matrix (see section 2 Complete the evaluation work plan).

Last updated: 11 January 2021

Give feedback about this page.

Share this page:

URL copied!