Apple delayed the launch of Siri’s more powerful and personal features last month. As it looks to correct the ship for future Apple Intelligence upgrades, Bloomberg highlights the shift Apple is making to its training of artificial intelligence models.
In the report, a blog entry from Apple’s Machine Learning Research siteis highlighted. It explains how Apple uses synthetic data in general to train its AI model. This strategy has some limitations, such as the fact that synthetic data can’t “understand trends” for features like summarization and writing tools, which operate on entire emails or longer sentences. Apple will soon begin using a new technology to address this limitation. This compares synthetic data with a small sample recent user emails without compromising the privacy of users:
We need to generate many emails that cover topics most commonly found in messages to improve our models. To curate a set of representative synthetic emails, we begin by creating a large number of synthetic messages covering a wide range of topics. We might create a message like “Would You Like to Play Tennis Tomorrow at 11:30AM?” (19659007). This is done without knowing individual user emails. We then create a representation of each synthetic message, called an embedded message, that captures key dimensions like language, topic and length. These embeddings will then be sent to a limited number of devices that have signed up for Device Analytics.
Participating device then selects a small sampling of recent user emails to compute their embeddings. Each device decides which synthetic embedding is closest to the samples. Apple can learn which synthetic embeddings are selected most frequently across all devices using differential privacy. This allows Apple to do this without knowing which synthetic embeddings were selected on any device.
The most frequently selected synthetic embeddings are then used to generate testing or training data, or additional curation steps can be run to further refine the dataset. If the message about playing soccer is one of top embeddings then a similar message could be generated to replace “tennis” and added to the dataset for the next round. This allows us to improve topics and language in our synthetic emails. We can then train our models to produce better text outputs for features like email summaries while protecting privacy. Bloomberg reports that Apple will implement this new system as part of a future beta version of iOS 18.5 or macOS 15.5. For more information, you can read Apple’s complete blog post. Follow Chance: ThreadsBlueskyInstagramand Mastodon. Add 9to5Mac’s Google News feed.FTC: we use auto affiliate links that earn income. More.