reinforcement-learning

Artificial Intelligence, genrative-ai, large-language-models, reinforcement-learning, software-engineering

Reinforcement Learning From Human Feedback (RLHF) in Large Language Models(LLMs)

Kaushiktanishq / April 4, 2026

Source: Grok AI-generated illustrationWhat is RLHF ?RLHF is a M.L technique where Al improves by learning directly from human feedback. It is used to align Al models with human goals, ethics and preferences. It uses Human feedback to optimize LLMs to s…

ai-alignment-and-safety, artifical-intellegence, morality, psychology, reinforcement-learning

The Moral Ceiling of Reinforcement Learning

Kareem Soliman / March 29, 2026

Psychology classified reward-and-punishment reasoning as the most primitive form of moral development seventy years ago. It remains the most sophisticated behavioural framework in AI. The next breakthrough may come from formalising what lies beyond it….

ai, forecasting, python, reinforcement-learning, timeseries

Time Series Forecasting Changes When Actions Change the Data

Shenggang Li / March 25, 2026

A practical framework for action-aware forecasting in business and investingContinue reading on Towards AI »

Agentic AI, AI Agents, Editors Pick, reinforcement-learning, Staff, Technology, Tutorials

Implementing Deep Q-Learning (DQN) from Scratch Using RLax JAX Haiku and Optax to Train a CartPole Reinforcement Learning Agent

Asif Razzaq / March 22, 2026

In this tutorial, we implement a reinforcement learning agent using RLax, a research-oriented library developed by Google DeepMind for building reinforcement learning algorithms with JAX. We combine RLax with JAX, Haiku, and Optax to construct a Deep Q-Learning (DQN) agent that learns to solve the CartPole environment. Instead of using a fully packaged RL framework, […]

The post Implementing Deep Q-Learning (DQN) from Scratch Using RLax JAX Haiku and Optax to Train a CartPole Reinforcement Learning Agent appeared first on MarkTechPost.

Artificial Intelligence, deep-learning, Machine Learning, python, reinforcement-learning

Using Reinforcement Learning to Solve Real-World Problems

Gulshan Yadav / March 22, 2026

I watched my 4-year-old nephew learn to ride a bicycle last summer. Nobody handed him a manual. Nobody showed him labeled training data of…Continue reading on Towards AI »

data, decision-making, exploitation, exploration, reinforcement-learning

Exploration and Exploitation: The Simple Yet Profound Logic at the Heart of Reinforcement Learning

Shenggang Li / March 21, 2026

Why every smart system — and every smart person — must learn to balance “trying new things” with “sticking to what works”Continue reading on Towards AI »

deep-learning, Overviews, reinforcement-learning

Mamba Explained

Kola Ayonrinde / March 28, 2024

Is Attention all you need? Mamba, a novel AI model based on State Space Models (SSMs), emerges as a formidable alternative to the widely used Transformer models, addressing their inefficiency in processing long sequences.

Machine Learning, reinforcement-learning

ML Mentorship: Some Q/A about RL

Unknown / July 30, 2021

One of my ML research mentees is following OpenAI’s Spinning up in RL tutorials (thanks to the nice folks who put that guide together!). She emailed me some good questions about the basics of Reinforcement Learning, and I wanted to share some of m…

ai, Career, deep-learning, Machine Learning, reinforcement-learning

Free Office Hours for Non-Traditional ML Researchers

Unknown / June 20, 2020

Xiaoyi Yin (尹肖贻) has kindly translated this post into Chinese (中文)This post was prompted by a tweet I saw from my colleague, Colin:I’m currently a researcher at Google with a “non-traditional background”, where non-traditional background means “so…

ai, deep-learning, generalization, Machine Learning, reinforcement-learning

Meta-Learning in 50 Lines of JAX

Unknown / February 21, 2019

Github repo here: https://github.com/ericjang/maml-jax

Adaptive behavior in humans and animals occurs at many time scales: when I use a new shower handle for the first time, it takes me a few seconds to figure out how to adjust the water temperat…