Dr. Melissa Hogan
February 16, 2026
.png)
If you’ve spent any time in edtech, you’ve seen it:
“Proven to accelerate learning.”
“Delivers measurable gains.”
“Backed by research.”
“Positive results.”
But when you ask the obvious follow-up, Can you show me the study? Too many vendors hand over something that isn’t evidence at all.
In other words: impact as a marketing asset, not impact as a credible claim about student learning. And districts are paying the price.
We’ve entered a new era. Districts are navigating tighter budgets, higher accountability, and a market flooded with tools that all promise results. Yet “positive outcomes” are still being treated as proof.
Here’s the problem: positive is easy to manufacture.
Without rigor, almost any implementation can be framed as a success. And when positivity becomes the bar, risk quietly replaces reassurance.
The gap between promise and proof is no longer anecdotal, it’s now documented by independent research.
Recently, The Economist delivered a blunt assessment that many educators and district leaders have quietly suspected for years: despite massive adoption and billions in spending, most education technology has failed to produce meaningful learning gains. Drawing on large-scale meta-analyses and independent research, the article concludes that the majority of tools show marginal, null, or even negative effects on student outcomes, a stark contrast to the confident impact claims that dominate vendor marketing.
This isn’t an argument against technology in classrooms. It’s an indictment of how the industry has defined and defended “impact.” Too often, short pilots, in-tool performance, usage growth, or positive anecdotes are mistaken for evidence of learning. When independent researchers apply higher standards, such as durability, transfer, effect size, the results rarely hold. The problem isn’t innovation. It’s that for too long, EdTech has been allowed to market promise as proof.
If you’ve been in district leadership long enough, you’ve seen versions of all of these:
“Students using [Product] showed statistically significant gains in just eight weeks.” The pilot included 42 students across two classrooms.
“Teachers reported increased engagement and confidence after implementing [Product].” Survey sent only to participating teachers; response rate not disclosed.
“[District Name] saw measurable growth district-wide.” Data shown from a single, high-performing school.
“Early results demonstrate strong momentum.” No baseline comparison, no control group, no historical context.
“Growth exceeded expectations.” Expectations not defined.
Every statement above is technically positive. None of them, on their own, tell a district whether learning actually improved or whether those results would hold once scaled.
What districts are shown: Positive charts. Promising pilots. Encouraging quotes.
What districts need: Evidence that student outcomes improved because of the product, beyond what would have happened otherwise, at scale, under real conditions. The gap between those two is where risk lives.
This is the gap districts are being asked to cross, often without realizing it.
Everything on the left can sound encouraging. Everything on the right reduces risk. Confusing the two is how marketing quietly replaces evidence.
You can generate a “positive” result from:
None of this requires bad intent. But all of it produces the same outcome: confidence without clarity.
Positive results aren’t wrong, but they are radically incomplete. And in today’s environment, positivity without rigor isn’t reassurance. It's a risk.
Global watchdogs are flagging the same imbalance.
A UNESCO Global Education Monitoring report found that across 17 U.S. states, only 11% of teachers and administrators requested peer-reviewed evidence before adopting edtech tools.
Independent research reviews echo the pattern. A 2023 peer-reviewed study examining the 100 most widely used edtech products in U.S. schools found that only about a quarter had evidence of research-backed positive impact.
This isn’t a fringe problem. It’s a market failure, high-velocity purchasing paired with low-rigor evidence.
That’s how marketing quietly masquerades as impact.
Here’s the cleanest distinction:
Marketing assets answer:
Impact evidence answers:
A story can be true and still be misleading. Impact claims require proof.
If a vendor claims impact, districts should be able to find these elements immediately, on page one. Not buried in footnotes. Not “available upon request.” Not replaced by quotes and vibes.
1) Clear outcomes (not proxy metrics). Student learning outcomes, not just engagement, clicks, or time-on-task. If intermediate measures are used, vendors should explain why they predict learning.
2) A credible comparison (“compared to what?”). A well-matched comparison group, a quasi-experimental design with baseline equivalence, or a strong within-student design when appropriate.
3) Sample size and representativeness. Clear sample size (represented by n), inclusion criteria, and transparency about whether the sample reflects your district’s context.
4) Statistical significance and confidence. Confidence intervals or p-values, the model used, and how clustering was handled when students are nested in classrooms or schools.
5) Effect size (practical significance). Districts don’t live in p-values, they live in instructional time and learning gains. Vendors should quantify magnitude and interpret it responsibly.
6) Implementation fidelity and dosage. What level of use produced results? What practices mattered? Did outcomes hold across schools and student groups?
7) Transparency about limitations. Constraints, subgroup differences, null results, and threats to validity. If every chart is a win, you’re not looking at research, you’re looking at a sales narrative.
The market has rewarded the wrong behaviors: the slickest one-pager, the strongest testimonial, the most confident story.
Meanwhile, real evidence takes time, costs money, and sometimes produces mixed results. That’s exactly why it’s valuable.
For years, edtech lived on proof of promise, plausible theories and early pilots. That era is ending.
The next era is proof of performance: transparent, testable claims tied to student outcomes, at scale, under real district conditions.
Districts deserve the difference. Vendors must earn it.
Because in 2026 and beyond, the question won’t be: “Does this sound innovative?” It will be: “Can you prove it worked and show me exactly how you know?”
Ask every vendor to show you how impact was proven, not just what the results were.
If they can’t clearly explain sample size, comparison groups, and significance in plain language, treat the claim as marketing, not evidence.
If districts are going to demand real evidence, the work doesn’t stop once a study clears basic standards of rigor. How results are summarized matters just as much as whether they exist.
This is where many “credible” impact claims quietly fall apart.
Even when vendors move beyond anecdotes and pilots, the most common reporting shortcut is the average. A single number meant to represent thousands of students, dozens of schools, and wildly different implementation conditions. Averages feel scientific. They’re easy to compare. And they’re often deeply misleading.
Because averages can mask who actually benefited, who didn’t, and under what conditions learning improved. They can hide inequities. They can overstate success driven by a small subset of high-fidelity classrooms. And they can give districts false confidence that impact was broad, when it was anything but.
In other words, averages can turn legitimate evidence into another form of marketing.
In Post 2, we’ll examine why average impact is one of the most dangerous metrics in EdTech, and what districts should ask for instead if they want to understand who a tool works for, when it works, and why.
Next up: Why Average Impact Is the Most Dangerous Metric in EdTech
For definitions and key terms used throughout this series, see the Impact Reality Series Glossary of Terms below.
The number of students, classrooms, teachers, or schools included in an analysis. Sample size matters because small samples produce unstable results that may not generalize when a tool is scaled across a district.
A statistical measure that estimates how likely it is that observed results occurred by chance. A smaller p-value indicates stronger evidence that an observed effect is real, not random. In this series, p-values are discussed as a minimum requirement for separating signal from noise, not as proof of meaningful impact on their own.
The threshold used to determine whether results are considered statistically significant (commonly p < 0.05). This reflects the acceptable risk of falsely concluding that an effect exists. Statistical significance answers whether an effect is likely real, not whether it is educationally meaningful.
A measure of how large an observed effect is in practical terms. Unlike p-values, effect sizes help districts understand how much learning changed, not just whether a change occurred. In the series, effect size is emphasized as essential for interpreting instructional relevance.
A condition in which comparison groups start at similar levels before an intervention. Without baseline equivalence, or statistical controls to adjust for differences, it is impossible to determine whether observed outcomes reflect learning gains or pre-existing advantages.
A group of participants that does not represent the broader population (e.g., early adopters, high-performing classrooms, volunteers). Biased samples often produce overly positive results that fail to replicate at scale.
Indirect measures used as stand-ins for learning outcomes, such as engagement, usage time, or completion rates. Proxy metrics can be informative, but they are not evidence of impact unless clearly linked to validated learning outcomes.
A research approach used when randomized control trials are not feasible. These designs rely on comparison groups and statistical controls to estimate causal effects. In the series, quasi-experimental designs are positioned as acceptable, but only when rigorously executed and transparently reported.
The degree to which a tool or intervention is used as intended. Fidelity matters because impact often depends on how something is implemented, not just whether it is adopted. Without fidelity data, impact claims are not actionable.
Explicit acknowledgments of what an analysis cannot conclude. Limitations may include small samples, short timeframes, unmeasured variables, or context-specific findings. Transparent limitations increase trust and reduce risk for districts.
A defined subset of students or classrooms (e.g., by grade, demographic group, usage level, or performance band). Subgroup analysis reveals who benefits, who does not, and whether impact is equitable, information that averages often conceal.
The portion of observed outcomes that reflects real, instructionally attributable learning gains. Signal is meaningful, replicable, and distinguishable from random variation. The series frames signal as the goal of impact analysis.
Variation in outcomes caused by factors unrelated to the intervention itself, such as prior achievement, teacher experience, novelty effects, partial adoption, or chance. Noise can make results appear stronger or weaker than they actually are if not controlled for.
A classification system under the Every Student Succeeds Act that categorizes evidence strength from Tier I (strong) to Tier IV (demonstrates a rationale). In the series, ESSA tiers are treated as a starting point, not a guarantee of ongoing or context-specific effectiveness.
The cumulative risk districts take on when they rely on weak, outdated, or non-revalidated evidence over time. Like technical debt, Evidence Debt accumulates quietly and becomes harder to correct the longer it goes unaddressed.
Data that summarizes what happened without establishing why it happened. Descriptive evidence can show trends or correlations but cannot determine whether an intervention caused observed outcomes.
Evidence designed to answer whether an intervention caused observed outcomes. Causal evidence requires comparison groups, controls, and statistical testing to isolate instructional impact from coincidence.
The spread of outcomes across students or classrooms, rather than a single average. Distributional analysis reveals variability, inequity, and thresholds for success that averages obscure.
The extent to which learning gains persist over time, across cohorts, and as implementation conditions change. In the series, durability is positioned as the ultimate test of real impact.
Describes impact that benefits students fairly across groups, rather than concentrating gains among already-advantaged populations. Equity cannot be inferred from averages, it must be examined through distributions and subgroup outcomes.
The gradual erosion of learning gains as novelty fades, support decreases, or implementation becomes inconsistent. Impact decay explains why early results often fail to hold through renewal cycles.
A conceptual measure of how long learning gains persist before diminishing. If gains disappear before renewal, the series argues the impact did not endure, it expired.
A general term describing the loss of effectiveness over time. In the context of EdTech impact, decay reflects systems that produce short-term momentum but lack mechanisms for sustained instructional improvement.
The alignment of curriculum, instructional practice, professional learning, and assessment over time. Coherence matters because fragmented tools, even effective ones, undermine sustained impact when they do not reinforce shared instructional goals.