AI safety institute to test children's AI tools, create benchmarks

A new industry-backed research lab plans to test AI tools used by children and create safety benchmarks for tech companies.

A new industry-backed research lab is launching with a focused mission: test the AI tools children use every day and build safety benchmarks that tech companies can follow. The initiative comes at a moment when families are struggling to gauge how safe these tools really are — and when tech companies are under growing pressure to prove they are not exposing young users to harm.

The lab, whose full name and founding partners were not disclosed in the briefing, describes itself as an independent research organization. But the phrase "industry-backed" means the funding comes at least partly from the same companies whose products it will evaluate. That tension — independence versus industry influence — will be central to whether the lab earns trust from parents, educators, and regulators.

What the lab will do

According to the announcement, the lab plans to conduct systematic testing of AI tools that are marketed to or commonly used by children. That includes everything from educational chatbots and homework helpers to AI-powered games, virtual tutors, and content-filtering tools. The goal is to identify risks that are not obvious from a surface-level review — such as biased responses, inappropriate content generation, data privacy leaks, or manipulative design patterns.

The lab will then use those findings to develop a set of safety benchmarks. These benchmarks would be offered to tech companies as a standard framework for evaluating their own products before release. In theory, a company that passes the lab's benchmarks could advertise that its AI tool has been independently verified as safe for children.

The briefing did not specify a timeline for when the first benchmarks would be published, nor did it name any companies that have already agreed to participate in testing.

Why this matters for families

Right now, parents face a confusing landscape. A child might use an AI chatbot to help with homework, a voice assistant to set reminders, and a generative AI feature inside a social media app — all in the same afternoon. Each tool has its own terms of service, privacy policy, and safety features. Most parents lack the technical background to evaluate whether those protections are adequate.

A 2023 survey by the Pew Research Center found that nearly half of parents of teens had little to no confidence that AI companies would protect their children's data. Since then, the pace of AI adoption has only accelerated. New products launch weekly, and many are designed to appeal directly to younger users — often without clear labels about what data they collect or how they moderate content.

The lab's work could close that information gap. A standardized set of benchmarks would give families a shorthand: if a product meets the lab's criteria, it has passed a third-party evaluation. That is similar to how the nonprofit Common Sense Media rates movies, apps, and games for age-appropriateness and educational value, though the lab's focus would be on technical safety rather than content ratings.

Real-world use cases for the benchmarks

Consider a voice assistant designed for kids. Parents may want to know whether the assistant can refuse to answer inappropriate questions, whether it stores audio recordings, and whether those recordings are ever shared with advertisers. The lab's benchmarks could define minimum requirements in each of those areas.

Or take an AI tutor that uses large language models to help students write essays. Without safety testing, the model might inadvertently generate harmful suggestions or plagiarized content. The lab could create tests that simulate these failure modes and require companies to demonstrate they have mitigation strategies in place.

Even simpler tools, such as AI-powered drawing apps that generate images from text prompts, pose risks. They could produce inappropriate imagery or collect a child's drawings and the associated prompts into a training dataset. The benchmarks would likely require transparency about whether data is used for model training and what kind of content filters are active.

Potential limitations and concerns

The biggest question mark around the lab is its independence. Industry backing usually means the lab's budget comes from donations or grants provided by companies that have a vested interest in the outcomes. If a major tech company funds the lab and then fails a safety benchmark, the lab could face pressure to soften its findings.

The briefing did not describe how the lab plans to govern itself — whether it has an independent board, whether its findings will be published in full, or whether companies can opt out of having their products tested. Without those structural details, the lab risks being seen as a PR exercise rather than a genuine safety initiative.

Another limitation is scope. The lab appears to be focused on testing tools used by children, but children also interact with AI through platforms that are not labeled as children's products. A teenager using a general-purpose AI assistant on a smartphone, for example, falls outside this testing scope — even though the same risks apply.

Finally, benchmarks are only useful if companies choose to adopt them. The lab has not announced any commitments from tech companies to implement its standards. Without a carrot or a stick — such as a certification badge that consumers can easily recognize, or regulatory pressure to comply — the benchmarks may end up gathering dust on a shelf.

Broader industry context

The lab enters a crowded field of AI safety initiatives. Several governments have proposed or enacted laws requiring safety testing for high-risk AI systems, including those used by children. The European Union's AI Act, for example, classifies AI systems that interact with minors as high-risk and imposes transparency and conformity assessment requirements. In the United States, the Biden administration issued an executive order on AI safety in late 2023 that called for standards around AI-generated content and safety testing, but legislation has been slow to follow.

Nonprofit organizations have also stepped in. The responsible AI movement has produced numerous frameworks for fairness, accountability, and transparency. But few of those frameworks are tailored specifically to children, and even fewer come with a concrete testing protocol that companies can implement today.

The new lab could fill that niche — provided it builds enough credibility and momentum. Its success will depend on three things: publishing clear, methodologically sound benchmarks; attracting independent researchers to participate; and securing at least a handful of prominent companies that agree to be evaluated publicly.

What comes next

The lab plans to begin testing and benchmarking in the coming months. Families and educators who want to stay informed can watch for the release of its first public findings. For tech companies building AI products aimed at children, the message is clear: independent scrutiny is coming, and it will be better to participate voluntarily than to be caught off guard by mandatory regulations.

SysCall News will continue to track this initiative as more details emerge about its leadership, funding, and the specific benchmarks it develops. For now, the lab represents a promising step toward giving families the tools they need to make informed decisions about the AI their children use every day.