A Brief History of Neural Networks by Michael A. Nielsen
This concise history traces the evolution of neural networks from their inception to modern advancements. It discusses how recent improvements in network architecture and training have boosted performance in areas like image recognition, natural language processing, and robotics.
A Framework for US AI Governance (MIT Policy Briefs)
MIT’s policy brief proposes a governance framework for AI in the United States. It suggests leveraging existing regulatory structures to oversee AI applications responsibly, balancing innovation leadership with effective risk management and ethical considerations.
Accelerate AI With Annotated Data
Learn how leading companies use data annotation to propel their machine learning projects. This whitepaper demonstrates how high-quality, annotated datasets speed up AI development, improve model accuracy, and help organizations unlock the true potential of their AI initiatives.
Achieving AI ROI Through Training Data Diversity
This document addresses why many AI projects struggle with ROI. It highlights the pitfalls of limited, homogenous datasets and offers solutions like synthetic data generation, combining multiple data sources, and active learning to create diverse, robust training sets that enhance AI model accuracy and return on investment.
Accelerating Sustainability with AI: A Playbook by Microsoft
Microsoft’s playbook discusses AI’s role in tackling climate change and enhancing sustainability. It outlines how AI can improve predictions for wildfires, renewable energy, and resilient agriculture, and offers a five-point strategy for investing in sustainable AI solutions, data infrastructure, and workforce development.
Adaptation of Large Foundation Models by Google
Google’s whitepaper details adapter tuning methods for customizing large pre-trained AI models. It explains how adapter modules allow efficient fine-tuning for specific tasks, addresses security and privacy considerations, and outlines design principles for robust model adaptation on Google Cloud’s Vertex AI platform.
Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig
A foundational resource that examines the creation of intelligent machines, this whitepaper covers a breadth of AI methods such as machine learning, evolutionary computation, and neural networks. It provides an in-depth understanding of how these approaches enable computers to mimic human intelligence and decision-making.
Before You Take Off With Microsoft 365 Copilot, Don’t Skip This Essential Preflight Adoption Guide
This article serves as a pre-deployment checklist for Microsoft 365 Copilot. It explores use cases, readiness strategies, and practical steps to maximize ROI and ensure a smooth adoption process for businesses planning to integrate Copilot into their workflows.
Best Practices to Train Voice Bots
As voice bot technology expands across customer service, sales, and marketing, this paper outlines essential training strategies. It emphasizes using diverse, high-quality audio data—including various accents, dialects, and noise levels—and routine testing to ensure reliable real-world performance.
Coding on Copilot by William Harding and Matthew Kloster
This study investigates GitHub Copilot’s effect on code quality, revealing that while it accelerates code generation, it may also increase code churn and copy-paste practices. It raises important questions about maintainability and suggests that experienced developers remain cautious about AI-suggested code changes.
Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank
This practical resource delves into data mining and machine learning. It reviews key techniques such as decision trees, neural networks, clustering, and association rules, while also addressing common challenges and applications in business intelligence, fraud detection, and scientific research.
Deep Learning by Yoshua Bengio
This influential paper explores the field of deep learning, focusing on artificial neural networks that imitate the human brain. It explains how deep learning techniques extract intricate patterns from large, unstructured datasets, leading to advances in computer vision, natural language processing, and robotics.
Elements of Statistical Learning by Hastie, Tibshirani, and Friedman
This extensive guide covers statistical learning techniques such as linear regression, classification, and decision trees. It introduces a framework for modeling data, making predictions, and explores advanced methods like boosting and support vector machines, forming a cornerstone resource for data-driven analysis.
Introduction to Machine Learning by Ethem Alpaydin
An entry-level guide to machine learning, this document explains how algorithms learn from data—whether labeled, unlabeled, or through trial and error. It covers fundamental concepts in supervised, unsupervised, and reinforcement learning, highlighting the goal of automating knowledge extraction for improved decision-making.
Modernizing MDM: Unify and Mobilize Trusted Data in Real Time
This paper explores modern Master Data Management strategies to overcome legacy challenges. It discusses how unified, real-time data solutions can drive cost savings, productivity gains, and improved decision-making across organizations.
Pattern Recognition and Machine Learning by Christopher M. Bishop
Bishop’s whitepaper offers a broad look at how machines can automatically identify patterns in data. Covering supervised and unsupervised learning, feature selection, and more, it explains the underlying principles that allow computers to learn from and make sense of complex datasets.
Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto
An accessible primer on reinforcement learning, this document covers how agents learn optimal actions through trial and error to maximize rewards. It explains core concepts, real-world applications, and its ties to dynamic programming, providing a solid grounding in this learning paradigm.
Statistical Learning: Data Mining, Inference, and Prediction by Hastie, Tibshirani, and Friedman
Focusing on statistical models, this paper explains how to find structure in data through methods like regression and decision trees. It discusses how to formalize relationships among variables, extract patterns, and make predictions, relevant to fields from biology to finance.
Ventana White Paper: Enhancing Business Agility with Trusted Data Products
Focusing on AI-powered data unification, this whitepaper explains how breaking down data silos and creating accurate customer profiles can boost business agility. It provides insights on leveraging trusted data products for accurate segmentation and informed decision-making.
Whitepaper: Prepare Your Environment for Microsoft 365 Copilot
This guide offers a deep dive into securing and optimizing an environment for deploying Microsoft 365 Copilot. It outlines why upgrading to Zero Trust and Microsoft E5 is critical and provides actionable strategies to prepare IT infrastructure for safe, effective Copilot integration.
Whitepaper: When “As Secure As Possible” Isn’t Enough
Addressing evolving cybersecurity threats, this whitepaper explains how upgrading from Microsoft E3 to E5 can help organizations stay ahead of bad actors. It discusses enhanced security features and the importance of robust measures in the face of advanced cyber threats.
2023 Papers
Computer Vision
- 01/2023: Muse: Text-To-Image Generation via Masked Generative Transformers (Muse)
- 02/2023: Structure and Content-Guided Video Synthesis with Diffusion Models (Gen-1)
- 02/2023: Scaling Vision Transformers to 22 Billion Parameters (ViT 22B)
- 02/2023: Adding Conditional Control to Text-to-Image Diffusion Models (ControlNet)
- 03/2023: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models (Visual ChatGPT)
- 03/2023: Scaling up GANs for Text-to-Image Synthesis (GigaGAN)
- 04/2023: Segment Anything (SAM)
- 04/2023: DINOv2: Learning Robust Visual Features without Supervision (DINOv2)
- 04/2023: Visual Instruction Tuning
- 04/2023: Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models (VideoLDM)
- 04/2023: Synthetic Data from Diffusion Models Improves ImageNet Classification
- 04/2023: Segment Anything in Medical Images (MedSAM)
- 05/2023: Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold (DragGAN)
- 06/2023: Neuralangelo: High-Fidelity Neural Surface Reconstruction (Neuralangelo)
- 07/2023: SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis (SDXL)
- 08/2023: 3D Gaussian Splatting for Real-Time Radiance Field Rendering
- 08/2023: Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization… (Qwen-VL)
- 08/2023: MVDream: Multi-view Diffusion for 3D Generation (MVDream)
- 11/2023: Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks (Florence-2)
- 12/2023: VideoPoet: A Large Language Model for Zero-Shot Video Generation (VideoPoet)
NLP
- 01/2023: DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature (DetectGPT)
- 02/2023: Toolformer: Language Models Can Teach Themselves to Use Tools (Toolformer)
- 02/2023: LLaMA: Open and Efficient Foundation Language Models (LLaMA)
- 03/2023: GPT-4
- 03/2023: Sparks of Artificial General Intelligence: Early experiments with GPT-4 (GPT-4 Eval)
- 03/2023: HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace (HuggingGPT)
- 03/2023: BloombergGPT: A Large Language Model for Finance (BloombergGPT)
- 04/2023: Instruction Tuning with GPT-4
- 04/2023: Generative Agents: Interactive Simulacra of Human (Gen Agents)
- 05/2023: PaLM 2 Technical Report (PaLM-2)
- 05/2023: Tree of Thoughts: Deliberate Problem Solving with Large Language Models (ToT)
- 05/2023: LIMA: Less Is More for Alignment (LIMA)
- 05/2023: QLoRA: Efficient Finetuning of Quantized LLMs (QLoRA)
- 05/2023: Voyager: An Open-Ended Embodied Agent with Large Language Models (Voyager)
- 07/2023: ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs (ToolLLM)
- 08/2023: MetaGPT: Meta Programming for Multi-Agent Collaborative Framework (MetaGPT)
- 08/2023: Code Llama: Open Foundation Models for Code (Code Llama)
- 09/2023: RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback (RLAIF)
- 09/2023: Large Language Models as Optimizers (OPRO)
- 10/2023: Eureka: Human-Level Reward Design via Coding Large Language Models (Eureka)
- 12/2023: Mathematical discoveries from program search with large language models (FunSearch)
Audio Processing
- 01/2023: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers (VALL-E)
- 01/2023: MusicLM: Generating Music From Text (MusicLM)
- 01/2023: AudioLDM: Text-to-Audio Generation with Latent Diffusion Models (AudioLDM)
- 03/2023: Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages (USM)
- 05/2023: Scaling Speech Technology to 1,000+ Languages (MMS)
- 06/2023: Simple and Controllable Music Generation (MusicGen)
- 06/2023: AudioPaLM: A Large Language Model That Can Speak and Listen (AudioPaLM)
- 06/2023: Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale (Voicebox)
Multimodal Learning
- 02/2023: Language Is Not All You Need: Aligning Perception with Language Models (Kosmos-1)
- 03/2023: PaLM-E: An Embodied Multimodal Language Model (PaLM-E)
- 04/2023: AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head (AudioGPT)
- 05/2023: ImageBind: One Embedding Space To Bind Them All (ImageBind)
- 07/2023: Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning (CM3Leon)
- 07/2023: Meta-Transformer: A Unified Framework for Multimodal Learning (Meta-Transformer)
- 08/2023: SeamlessM4T: Massively Multilingual & Multimodal Machine Translation (SeamlessM4T)
Reinforcement Learning
- 01/2023: Mastering Diverse Domains through World Models (DreamerV3)
- 02/2023: Grounding Large Language Models in Interactive Environments with Online RL (GLAM)
- 02/2023: Efficient Online Reinforcement Learning with Offline Data (RLPD)
- 03/2023: Reward Design with Language Models
- 05/2023: Direct Preference Optimization: Your Language Model is Secretly a Reward Model (DPO)
- 06/2023: Faster sorting algorithms discovered using deep reinforcement learning (AlphaDev)
- 08/2023: Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization (Retroformer)
Other Papers
- 02/2023: Symbolic Discovery of Optimization Algorithms (Lion)
- 07/2023: RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control (RT-2)
- 11/2023: Scaling deep learning for materials discovery (GNoME)
- 12/2023: Discovery of a structural class of antibiotics with explainable deep learning
2022 Papers
Computer Vision
- 01/2022: A ConvNet for the 2020s (ConvNeXt)
- 01/2022: Patches Are All You Need (ConvMixer)
- 02/2022: Block-NeRF: Scalable Large Scene Neural View Synthesis (Block-NeRF)
- 03/2022: DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection (DINO)
- 03/2022: Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs (Large Kernel CNN)
- 03/2022: TensoRF: Tensorial Radiance Fields (TensoRF)
- 04/2022: MaxViT: Multi-Axis Vision Transformer (MaxViT)
- 04/2022: Hierarchical Text-Conditional Image Generation with CLIP Latents (DALL-E 2)
- 05/2022: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding (Imagen)
- 05/2022: GIT: A Generative Image-to-text Transformer for Vision and Language (GIT)
- 06/2022: CMT: Convolutional Neural Network Meet Vision Transformers (CMT)
- 07/2022: Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors… (Swin UNETR)
- 07/2022: Classifier-Free Diffusion Guidance
- 08/2022: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation (DreamBooth)
- 09/2022: DreamFusion: Text-to-3D using 2D Diffusion (DreamFusion)
- 09/2022: Make-A-Video: Text-to-Video Generation without Text-Video Data (Make-A-Video)
- 10/2022: On Distillation of Guided Diffusion Models
- 10/2022: LAION-5B: An open large-scale dataset for training next generation image-text models (LAION-5B)
- 10/2022: Imagic: Text-Based Real Image Editing with Diffusion Models (Imagic)
- 11/2022: Visual Prompt Tuning
- 11/2022: Magic3D: High-Resolution Text-to-3D Content Creation (Magic3D)
- 11/2022: DiffusionDet: Diffusion Model for Object Detection (DiffusionDet)
- 11/2022: InstructPix2Pix: Learning to Follow Image Editing Instructions (InstructPix2Pix)
- 12/2022: Multi-Concept Customization of Text-to-Image Diffusion (Custom Diffusion)
- 12/2022: Scalable Diffusion Models with Transformers (DiT)
NLP
- 01/2022: LaMBDA: Language Models for Dialog Applications (LaMBDA)
- 01/2022: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (CoT)
- 02/2022: Competition-Level Code Generation with AlphaCode (AlphaCode)
- 02/2022: Finetuned Language Models Are Zero-Shot Learners (FLAN)
- 03/2022: Training language models to follow human instructions with human feedback (InstructGPT)
- 03/2022: Multitask Prompted Training Enables Zero-Shot Task Generalization (T0)
- 03/2022: Training Compute-Optimal Large Language Models (Chinchilla)
- 04/2022: Do As I Can, Not As I Say: Grounding Language in Robotic Affordances (SayCan)
- 04/2022: GPT-NeoX-20B: An Open-Source Autoregressive Language Model (GPT-NeoX)
- 04/2022: PaLM: Scaling Language Modeling with Pathways (PaLM)
- 06/2022: Beyond the Imitation Game: Quantifying and extrapolating the capabilities of lang… (BIG-bench)
- 06/2022: Solving Quantitative Reasoning Problems with Language Models (Minerva)
- 10/2022: ReAct: Synergizing Reasoning and Acting in Language Models (ReAct)
- 11/2022: BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (BLOOM)
- 11/2022: Optimizing Language Models for Dialogue (ChatGPT)
- 12/2022: Large Language Models Encode Clinical Knowledge (Med-PaLM)
Audio Processing
- 02/2022: mSLAM: Massively multilingual joint pre-training for speech and text (mSLAM)
- 02/2022: ADD 2022: the First Audio Deep Synthesis Detection Challenge (ADD)
- 03/2022: Efficient Training of Audio Transformers with Patchout (PaSST)
- 04/2022: MAESTRO: Matched Speech Text Representations through Modality Matching (Maestro)
- 05/2022: SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language… (SpeechT5)
- 06/2022: WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing (WavLM)
- 07/2022: BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for ASR (BigSSL)
- 08/2022: MuLan: A Joint Embedding of Music Audio and Natural Language (MuLan)
- 09/2022: AudioLM: a Language Modeling Approach to Audio Generation (AudioLM)
- 09/2022: AudioGen: Textually Guided Audio Generation (AudioGen)
- 10/2022: High Fidelity Neural Audio Compression (EnCodec)
- 12/2022: Robust Speech Recognition via Large-Scale Weak Supervision (Whisper)
Multimodal Learning
- 01/2022: BLIP: Boostrapping Language-Image Pre-training for Unified Vision-Language… (BLIP)
- 02/2022: data2vec: A General Framework for Self-supervised Learning in Speech, Vision and… (Data2vec)
- 03/2022: VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks (VL-Adapter)
- 04/2022: Winoground: Probing Vision and Language Models for Visio-Linguistic… (Winoground)
- 04/2022: Flamingo: a Visual Language Model for Few-Shot Learning (Flamingo)
- 05/2022: A Generalist Agent (Gato)
- 05/2022: CoCa: Contrastive Captioners are Image-Text Foundation Models (CoCa)
- 05/2022: VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts (VLMo)
- 08/2022: Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks (BEiT)
- 09/2022: PaLI: A Jointly-Scaled Multilingual Language-Image Model (PaLI)
Reinforcement Learning
- 01/2022: Learning robust perceptive locomotion for quadrupedal robots in the wild
- 02/2022: BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning
- 02/2022: Outracing champion Gran Turismo drivers with deep reinforcement learning (Sophy)
- 02/2022: Magnetic control of tokamak plasmas through deep reinforcement learning
- 08/2022: Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning (ANYmal)
- 10/2022: Discovering faster matrix multiplication algorithms with reinforcement learning (AlphaTensor)
Other Papers
- 02/2022: FourCastNet: A Global Data-driven High-resolution Weather Model… (FourCastNet)
- 05/2022: ColabFold: making protein folding accessible to all (ColabFold)
- 06/2022: Measuring and Improving the Use of Graph Information in GNN
- 10/2022: TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis (TimesNet)
- 12/2022: RT-1: Robotics Transformer for Real-World Control at Scale (RT-1)
Historical Papers
- 1958: Perceptron: A probabilistic model for information storage and organization in the brain (Perceptron)
- 1986: Learning representations by back-propagating errors (Backpropagation)
- 1986: Induction of decision trees (CART)
- 1989: A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition (HMM)
- 1989: Multilayer feedforward networks are universal approximators
- 1992: A training algorithm for optimal margin classifiers (SVM)
- 1996: Bagging predictors
- 1998: Gradient-based learning applied to document recognition (CNN/GTN)
- 2001: Random Forests
- 2001: A fast and elitist multiobjective genetic algorithm (NSGA-II)
- 2003: Latent Dirichlet Allocation (LDA)
- 2006: Reducing the Dimensionality of Data with Neural Networks (Autoencoder)
- 2008: Visualizing Data using t-SNE (t-SNE)
- 2009: ImageNet: A large-scale hierarchical image database (ImageNet)
- 2012: ImageNet Classification with Deep Convolutional Neural Networks (AlexNet)
- 2013: Efficient Estimation of Word Representations in Vector Space (Word2vec)
- 2013: Auto-Encoding Variational Bayes (VAE)
- 2014: Generative Adversarial Networks (GAN)
- 2014: Dropout: A Simple Way to Prevent Neural Networks from Overfitting (Dropout)
- 2014: Sequence to Sequence Learning with Neural Networks
- 2014: Neural Machine Translation by Jointly Learning to Align and Translate (RNNSearch-50)
- 2014: Adam: A Method for Stochastic Optimization (Adam)
- 2015: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Cov… (BatchNorm)
- 2015: Going Deeper With Convolutions (Inception)
- 2015: Human-level control through deep reinforcement learning (Deep Q Network)
- 2015: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (Faster R-CNN)
- 2015: U-Net: Convolutional Networks for Biomedical Image Segmentation (U-Net)
- 2015: Deep Residual Learning for Image Recognition (ResNet)
- 2016: You Only Look Once: Unified, Real-Time Object Detection (YOLO)
- 2017: Attention is All you Need (Transformer)
- 2018: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (BERT)
- 2020: Language Models are Few-Shot Learners (GPT-3)
- 2020: Denoising Diffusion Probabilistic Models (DDPM)
- 2020: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale (ViT)
- 2021: Highly accurate protein structure prediction with AlphaFold (Alphafold)
- 2022: ChatGPT: Optimizing Language Models For Dialogue (ChatGPT)