During a test, a Marine unit in the Pacific used generative AI to not only collect intelligence, but also to interpret it. Routine intelligence work is just the beginning.
During much of the last year, US servicemen from the 15th Marine Expeditionary Unit were on three ships sailing throughout the Pacific conducting training exercises off the coasts of South Korea, the Philippines and India. Onboard the ships was also an experiment: Marines from the unit that is responsible for sorting out foreign intelligence to inform their superiors of local threats were testing generative AI, a leading AI tool that the Pentagon funds.
Two officers tell us that they used the new system to help scour thousands of pieces of open-source intelligence–nonclassified articles, reports, images, videos–collected in the various countries where they operated, and that it did so far faster than was possible with the old method of analyzing them manually. Captain Kristinenzenauer, for example, used large language models to summarize and translate foreign news sources. Captain Will Lowdon, on the other hand, used AI to write his daily and weekly intelligence report to his commanders. Lowdon says that “we still need to validate the source.”
He says that the commanders of the unit encouraged the use large language models because they are more efficient in a dynamic situation.
These tools were developed by the defense-tech firm Vannevar Labs. In November, the Pentagon’s Defense Innovation Unit awarded the company a production contract worth up to $99,000,000. The company was founded in 2019 by former CIA and US Intelligence Community members. It joins other companies like Palantir, Anduril and Scale AI, which have benefited from the US military’s embrace artificial intelligence. The US military has developed computer vision models, and similar AI tools like those used in Project Maven (19459025), but since 2017, generative AI – tools that can engage in humanlike conversation like those created by Vannevar Labs – represent a newer frontier.
Vannevar Labs applies large language models from OpenAI, Microsoft, and its own, as well as some of their own, to the troves of open source intelligence that it has been collecting since 2020. Vannevar’s products are unique because of the scale of this data collection. Terabytes of data, in 80 languages, are collected every day from 180 countries. The company claims it can analyze social media profiles, breach firewalls and gather nonclassified data (gathered by human agents on the ground) to access information that is hard to get online. It also uses reports from physical sensors which covertly monitor radiowaves to detect illegal shipping.
Vannevar builds AI models that translate information, detect threats and analyze political sentiment. The results are delivered through a ChatGPT-like chatbot interface. The goal is to provide critical information to customers on topics such as China’s efforts in securing rare earth minerals in Philippines and international fentanyl supplies chains.
Scott Philips, Vannevar Labs chief technology officer, says that “our real focus as a business is to collect data, make sense out of that data, and then help the US to make good decisions.” This approach is especially appealing to US intelligence apparatus, because the world is awash with more data than humans can possibly interpret. This problem led to the founding of Palantir in 2003, a company whose market value is over $200 billion and is known for its powerful and controversy tools, such as
Vannevar saw a unique opportunity in 2019 to use large language model, which was then new to the market, as a solution to the data conundrum. The technology could allow AI to not only collect data, but also to interact with someone to work through an analysis.
Vannevar’s tools proved useful during the deployment in Pacific. Enzenauer and Lowdon report that, while they were instructed by Vannevar to always double-check AI’s work, inaccuracies did not pose a major issue. Enzenauer used the tool regularly to track foreign news reports that mentioned the unit’s exercise and to perform sentiment analyses, detecting the opinions and emotions expressed in text. On previous deployments, she had to judge whether a foreign article reflects a friendly or threatening opinion towards the unit.
She says that the majority of her work was done by hand, including translating, coding and analyzing data. “It was definitely more time-consuming when using the AI.” In February, after the unit’s commander, Colonel Sean Dynan said that this was just the “tip of the iceberg” of generative AI. In December, the Pentagon announced that it would spend $100 million over the next two-years on pilots for generative AI applications. Microsoft and Palantir are also working on AI models using classified data. Israel, for example, has used AI to sort information and generate lists of targets during its war in Gaza. This practice has been widely criticized. Heidy Khlaaf is a chief AI scientist for the AI Now Institute and has extensive experience in leading safety audits of AI-powered systems. She says that this rush to integrate generative AI in military decision-making ignores other fundamental flaws: “We’re aware of how LLMs can be highly inaccurate, particularly in the context safety-critical applications which require precision.” She says that “human-in-the loop” is not always an effective mitigation. It is impossible for a person to assess the accuracy of an AI model that relies on thousands data points.
Sentiment Analysis is a task that AI still hasn’t mastered. Philips, Vannevar’s CTO, said the company had built models to determine whether an article was pro-US, but MIT Technology Review could not evaluate them. Chris Mouton is a senior engineer at RAND. He recently tested the suitability of generative AI for this task. He compared the accuracy of leading models such as OpenAI’s GPT-4, and an older version fine-tuned for intelligence work to human experts. He says that AI struggles to identify subtler types of propaganda. He adds, however, that the models can still be used for a variety of other tasks. Khlaaf also says that Vannevar’s method is limited by the fact that open-source intelligence can be questioned. Khlaaf says that while Mouton believes open-source data to be “pretty exceptional,” it is also exposed to the internet, making it more vulnerable to misinformation campaigns, botnets, and deliberate manipulation. The US Army has warned against this.
Mouton believes that the biggest question is whether these generative AI tools will be used by analysts as one of many investigative tools or whether they will produce the subjective analysis on which decisions are based. “This is the core debate,” he says.
Everyone agrees that AI models are accessible. You can ask them a simple question about complex pieces intelligence and they will respond in plain English. It’s still unclear what imperfections are acceptable in the name efficiency.
This story has been updated to include context from Heidy Khalaaf.