Frequently Asked Questions: Request for Proposals #1 – Formative Assessment

Submit Your Questions

We will continuously update this section as we receive questions about the Request for Proposals. Please submit a question using our contact form or send an email to k12ai-infrastructure [at] digitalpromise.org.

Applicant Office Hours Watch the Recordings

LinkedIn Live Office Hours: February 9th, 2026
Targeted Universalism Office Hour Recording. Passcode: n3cig@7F
Public Good/Technical Office Hour Recording. Passcode: Cf.Q3*N9
March 2 Office Hours Recording. Passcode: 0W&*BCKq

Notes from the Request for Proposals #1

Respond to the Request by March 8, 2026, 11:59:59 PM Pacific Standard Time
We are not presently investing in student or classroom video, or sensor-based approaches (e.g. eye-tracking) that face challenges of practicality or heightened privacy concerns.
Proposals to develop proprietary product enhancements will be returned without review.
Partnerships among organizations are encouraged.
Please note, the application form will ask you to consent to your proposal being shared with a broader pool of funders for potential funding opportunities beyond the scope of the K-12 AI Infrastructure Program.

Where is the online application?	Please use this link to access the online submission.
How many submissions can one organization submit?	Each organization is permitted to submit one application per track. For institutions and organizations larger than 2,000 staff members, each department can submit one application total (regardless of track). In that case, organizations should clearly identify which department they are representing.
Are non-profits under fiscal sponsorship eligible?	Yes, non-profits under fiscal sponsorship are eligible to apply.
Are organizations based outside of the United States eligible to apply?	At this time this opportunity is open to any U.S.-based applicants.
What happens after I submit?	We appreciate the time taken to submit to this RFP. The K12 AI Infrastructure Program team will lead a thorough review of all submissions. Select applicants may be invited to answer additional questions. All applicants should expect to hear an update by April 2026.
Can I submit an idea I’ve submitted to other funding calls?	You are welcome to submit an idea that you have submitted to other funding opportunities, as long as no conflict of interest exists.
How do you define public goods?	Public goods are licensed resources that can be broadly used and incorporated – directly or indirectly – by developers of AI-enabled tools and infrastructure for K-12 education, in order to improve the technology products that teachers and students rely on. Note: While we are not interested in advancing video monitoring or sensor based solutions, video might be part of data collected, and once deidentified, be a part of the public good asset.
Are folks from US territories eligible to apply?	Yes, applicants from U.S. territories like Puerto Rico and Guam are eligible.
Who are the intended users of the public goods?	We are primarily focused on technical users who would incorporate a dataset, model, or benchmark into their workflow as they build their K-12 product or service. The product or service does not need to be named as a “formative assessment product” — for example, a good AI tutor does a lot of formative assessment. We care about the process of formative assessment, not the overall product category. A starter list of intended users includes the following (proposers can make a case to extend this list): developers at a hyperscaler might use a public good benchmark to improve their “learn mode,” “student mode” “notebook LLM” or even their core LLM developers at an organization that offers an AI tutor for K-12 use might use public data to improve how they elicit and extend students’ knowledge around a diagram, perhaps by having data that illustrates how expert teachers work with students around their diagrams. researchers may work with a company to increase the validity of a conversation-driven formative assessment relative to a commonly used benchmark or diagnostic measure; they may incorporate models (algorithms) that annotate the conversations to make it easier to investigate the conversations with respect to the benchmarks. practitioners who are deciding what model(s) fit best for their local context. co-design teams including researchers, educators, and a company with a large-scale product might work to pivot away from traditional dashboards to interfaces that are more supportive of teachers’ use of formative assessment data; they may use a public good to prototype conversations with teachers about formative assessment data.
As per the statement announcing this partnership, this program aims to “close critical gaps” in AI learning. What are some examples of specific gaps, and how is this partnership expected to help bridge them?	Overall, the partnership’s first RFP is focused on formative assessment. Understanding what students know and where they need help, then using that information to adjust instruction. The overall critical gap is that formative assessment is very common and — when well implemented — very efficacious. But it’s often not well-implemented, especially with AI. For example, most products and services do a poor job of asking students follow up questions to understand what they know and do a poor job of building on strengths the student already has. More specifically, we are working on going beyond text, to include what a student draws to indicate their understanding and to be able to interact with them via speech — multimodality. Whereas much AI-driven feedback is ad hoc, we’ll be helping to build on proven learning science principles.
Can pilot data collection happen at international school sites, or does the U.S.-based applicant requirement extend to where the research is conducted?	Yes, but note that the RFP focuses on a US context. There will be a higher bar to demonstrate that the collected data is still relevant and learnings can transfer back to a US context. Validation against US data sets (which might not be made public) would strengthen the case.
We propose to collect a new, enriched dataset built upon our existing AI environment. Although we have gathered some preliminary data, its scope is limited and it has not yet been publicly published. Because we aim to substantially expand the dataset and collect new types of data with grant support, would this be considered as Track 1? (track 1 = proof of concept, track 2 = enhancing an existing asset)	For edge cases, folks should be guided by the RFP description of track 2 as projects that can “rapidly produce a public good, e.g, within 6-12 months”. If longer than that would be needed to produce a public good, the project is likely track 1. However, we won’t penalize a proposal based on whether it is correctly classified between tracks 1 and 2. Note: Reviewers will move applications to the appropriate track if needed.
Can team members (other than the PI) be located outside of the U.S., e.g. a Canadian citizen working in Canada contributing to the project as a consultant?	Yes, this is allowed.
Some public goods might overlap (e.g., building a benchmark requires an algorithm). Is creating multiple goods a strength, or should we focus on just one?	Having multiple public goods is a strength because the goal is open-source infrastructure. However, ensure the project remains focused enough to be successful. Each individual public good will still be held to the same quality standard, so ensure that you are not sacrificing quality for quantity. Keep in mind that there will be multiple rounds of RFPs.
For a dataset, what does a “dissemination plan” look like beyond just licensing?	Strength lies in demonstrating how you will bring the dataset to target end users in a way that works for them and is most likely to lead to adoption. This includes thoughts about where the data will be hosted, how users can access it, how it will be documented, etc. Focus on FAIR principles (Findable, Accessible, Interoperable, Reusable). Evidence of existing practitioner communities or working groups ready to use the data is a major plus.
If we want to address bias, can we collect new data specifically for AI training, or must the dataset already exist?	You can collect new data. However, you must be very clear about your “universe” (population), your methods, and the limitations of representativeness (e.g., if you only collect data in English).
Can we focus on one specific type of assessment, like speech?	Yes, one is definitely okay. Given the 6-month timeframe for some tracks, focusing on one specific domain is often more feasible than trying to cover multiple forms.
Is synthetic data acceptable?	The use of synthetic data is not prohibited, but it is approached with caution. You must demonstrate a clear need for it and provide a robust validation plan to show it mirrors real-world population characteristics.
How can Gen AI be “open” when using models often incurs API/token costs (the “token toll”)?	You can use open-source weights or run models locally to avoid some costs. However, if API costs are indispensable for development, you should scope and argue for those costs within your project budget.
Can we use proprietary tools (like a private agent) to produce an open dataset? (i.e. would a “Custom GPT” or “Gemini Gem” count as a public good?)	Yes, this can be acceptable as long as the final public good is open. However, an application will be stronger if the methods for creating the dataset can also be made open. Datasets can be considered a public good as long as their use and development remain unrestricted. We are seeking proposals at the “infrastructure-level” and for that, a raw dataset or model weight is more generalizable.
How does “attribution” work for AI training data if an LLM can’t trace its output back to a specific source?	The license applies to the dataset itself as a standalone asset, not necessarily the downstream LLM’s individual outputs. The goal is to make the asset available for others to use in various ways.
What is a “credible” scale for a dataset? Hundreds or thousands of interactions?	It depends on the use case. A few hundred hours of rich tutoring transcripts might be enough, but a few hundred simple text “turns” likely wouldn’t be sufficient for AI training. If your data is on the smaller side, presenting evidence that it still enables statistically significant work will make for a stronger application.
Can data be collected at international schools if the applicant is U.S.-based?	The focus is on the U.S. context. If you can demonstrate that the international data is highly relevant and transferable to the U.S. population, it may be considered, but the team needs to verify this administratively.