GET GOING FAST

10/13/04 - SD Weekly Roundup - Key Highlights and Update

Added 2024-10-14 02:02:19 +0000 UTC

New

This audio podcast will be available every Sunday . Check in every week!

(This podcast was made with Google Notebook LLM from original content by: Diffusion Digest)

Be sure to support them and sign up for the digital digest at: https://diffusiondigest.beehiiv.com/

In this episode:
Interesting find of the week: Runway CEO, Cristóbal Valenzuela, discusses AI's impact on the film and creative industries, exploring industry adoption, and sharing his vision for AI in creativity in a new Youtube video. Notable mentions:

Valenzuela views AI video generation as a "new camera," opening up novel forms of artistic expression and storytelling.
He asserts that humans using AI tools should retain ownership of their creations, similar to using any other creative tool.
Valenzuela predicts that a AI (assisted) generated film could win an Oscar or become a box office hit as early as next year.

Youtube Source

In this issue:

If you’re trying to get to inbox zero but you still want to read this later:

Read online (opens in a new tab)

REMSPACE: DREAMING IN CODE

California neurotechnology startup REMspace has announced achieving two-way communication with people during dreams. The company claims volunteers in their experiments were able to realize they were dreaming and exchange simple information with the outside world. Participants reportedly received commands from a server while dreaming and successfully responded.

REMspace used "specially designed equipment" including a server, an apparatus, WiFi, and sensors. Two study participants slept in separate homes while their brain waves were remotely tracked. Once the server detected that one participant had entered a lucid dream, it generated a random word and transmitted it via earbuds. Eight minutes later, the second participant entered a lucid dream, and the server transmitted the stored message, which was then repeated upon awakening.

REMspace CEO Michael Raduga envisions numerous applications, particularly in mental health, stating, "This opens the door to countless commercial applications, reshaping how we think about communication and interaction in the dream world." However, it's crucial to note that these experimental results await peer review and independent verification. If validated, this breakthrough could mark a significant milestone in sleep research, potentially revolutionizing mental health treatments and skills training methods.

simple.ai - The Agent AI newsletterJoin 170,000+ others and learn how to use Agent AI to grow your career or business.Subscribe

ELEVENLABS + DEEPREEL PARTNER = AI.LONSO

ElevenLabs and DeepReel announced a collaboration with Aston Martin Aramco Formula One Team and driver Fernando Alonso to launch an AI-powered tool called Ai.lonso. This tool is designed to enhance fan engagement for the Formula One team.

Ai.lonso uses AI text-to-speech technology to read and translate content on the team's website using an AI-generated version of Fernando Alonso's voice. At launch, the functionality is available in English, Spanish, and French, with plans to add more languages in the future.

The stated goal of this technology is to make team content more accessible and to personalize fan interactions. Rob Bloom, Chief Marketing Officer of Aston Martin Aramco Formula One Team, described it as the latest step in adopting new technology to enrich the F1 fan experience.

The collaboration also involves avatar technology provided by DeepReel, which aims to create AI-driven visual representations for fan engagement.

ElevenLabs and DeepReel representatives commented on the potential of AI voice and avatar technology to maximize athlete availability and create new forms of fan engagement in sports.

This development is part of a broader trend of sports teams and organizations exploring AI technologies to enhance fan experiences and overcome language barriers in global sports markets.

‘ElevenLabs’ Source

PUT THIS ON YOUR RADAR

AI Inverse Painting: Recreating Masterpieces Step-by-Step

Researchers from the University of Washington have developed "Inverse Painting," a diffusion-based method that generates time-lapse videos showing how a painting might have been created, starting from a blank canvas to the final artwork.

Uses AI to reconstruct the painting process from a single input image
Trained on acrylic painting videos to learn human painting techniques
Capable of handling various artistic styles, including works like Van Gogh's
Incorporates text and region understanding to define painting instructions
Uses a novel diffusion-based renderer to update the canvas iteratively

Project Page | Research Paper | Github Code

DressRecon: 3D Human Models from Videos with Clothing Detail

Carnegie Mellon University researchers have developed DressRecon, an AI technology that creates detailed 3D human models from single-camera videos, capturing complex clothing and held objects.

Reconstructs 3D models from monocular video inputs
Captures intricate details of loose clothing and handheld objects
Uses neural implicit model to separate body and clothing deformations
Leverages image-based prior knowledge for enhanced realism

Project Page | Research Paper | Github Code

Podcastfy: Open-Source Tool for Converting Text to Audio Podcasts

Podcastfy is an open-source Python package that transforms various text formats into multilingual audio dialogues, offering an alternative to Google's NotebookLM with enhanced customization options.

Converts web content, PDFs, and text into podcast-style audio
Uses Generative AI for multilingual dialogue creation
Offers a Gradio demo and HuggingFace space for easy testing
Emphasizes programming and customized generation methods

Github Page

PMRF: Breakthrough in Image Restoration

A new algorithm called Posterior Mean Refinement Flow (PMRF) is making waves in image processing, offering superior performance in tasks like denoising, super-resolution, and image inpainting. PMRF uniquely balances distortion reduction and perceptual quality enhancement.

Combines posterior mean prediction and refinement flow models
Excels in multiple image restoration tasks
Achieves high scores on metrics like PSNR, SSIM, and FID
Produces natural-looking results with low distortion
Outperforms existing methods on various benchmarks

Hugging Face Spaces Demo | Project Page

WonderWorld AI: Real-Time 3D Scene Generation from a Single Image

Researchers from Stanford University and MIT have developed WonderWorld, an AI system capable of generating 3D scenes from a single image in just 10 seconds. This technology allows for real-time interaction and scene exploration, marking a significant advancement in 3D environment creation.

Generates 3D scenes in 10 seconds using an Nvidia A6000 GPU
Allows user control over scene content and layout
Uses a three-level FLAGS representation (foreground, background, sky)
Employs guided depth diffusion to reduce geometric distortion
Outperforms previous methods in speed and visual quality
Limitations include forward-facing surfaces only and some visual artifacts

Project Page

Hailuo AI Launches Image-to-Video Generation Feature

Hailuo AI has introduced a new image-to-video feature that allows users to create videos using both text and image inputs. This tool offers precise object manipulation, various style options, and aims to simplify video production for creators of all skill levels.

Prompt: The video is depicted in pixel style with an 8-bit canvas showing a kitten walking down a busy street. The avatar displays the holographic inscription "Volatility, 3 September 2024" and the beautiful signature "Ludma" below.

Accepts both text descriptions and reference images as input
Provides accurate object recognition and manipulation
Offers a wide range of style options (e.g., surrealism, anime, sci-fi)
Supports complex prompts for detailed video generation
Features an intuitive interface with real-time preview
Targets video creators, marketers, and social media managers
Includes built-in material library and recommendation system

Hailuo Link

Free 3D Object Texturing Tool Using Forge and ControlNet

u/ai_happy has created a free tool for texturing 3D objects using Forge and ControlNet. The tool, now in version 2.0, includes new features like Autofill and a Re-think brush, allowing game developers to texture decorations and characters on their local PCs at no cost.

Version 2.0 introduces Autofill and Re-think brush features
Supports multiple 3D file formats including FBX, OBJ, and GLB
Handles complex models with multiple UV sets and UDIMs
Compatible with Stable Diffusion models and VAEs
Offers orthographic camera mode for capturing more surfaces

Link | Reddit Thread

Gradio: Background Removal For Videos

Image to Pixel Style Converter

A new ComfyUI workflow has been shared that transforms regular images into pixel art style, offering a range of artistic interpretations rather than simple pixelation.

Uses a combination of pixel art checkpoints and LoRAs
Workflow includes IP-Adapter for better image coherence
Results tend to have an anime-inspired aesthetic
Some outputs show significant artistic liberties (e.g., changed poses, added elements)
Free workflow available, though some users reported needing to sign up to access it

Reddit Thread

FacePoke: Interactive Face Expression Editor

FacePoke is a new open-source tool that allows users to manipulate facial expressions in images using a simple drag-and-drop interface. Built on LivePortrait technology, it offers real-time editing of various facial features.

Drag-and-drop interface for adjusting facial expressions
Based on LivePortrait technology
Real-time editing and preview
Open-source project available on GitHub
Web demo available for testing without installation

Hugging Face Spaces Demo | GitHub Repo | Reddit Thread

Dreamina AI V2.0

Pyramid Flow SD3: New Open Source Video Generation Tool

Researchers have released Pyramid Flow SD3, a new open-source AI model for video generation based on Stable Diffusion 3. The tool aims to improve upon existing video generation models in quality and consistency.

Prompt: At dusk, a car is driving on the highway, with the rearview mirror reflecting a colorful sunset and serene scenery

Outperforms CogVideoX-2B and is comparable to the 5B version
Offers 384p and 768p model versions
Initial release requires high VRAM (26GB for 384p, 40GB for 768p)
Developers are working on optimizations to reduce VRAM requirements
Researchers plan to release a version trained from scratch to address human structure issues

Project Page | Github Page | Reddit Thread

EdgeRunner: NVIDIA's High-Quality 3D Mesh Generator

NVIDIA has introduced EdgeRunner, a new AI-powered tool that can generate high-quality 3D meshes from images and point-clouds. This technology represents a significant advancement in automated 3D modeling.

Generates 3D meshes with up to 4,000 faces at a spatial resolution of 512
Works with both images and point-clouds as input
Produces meshes that resemble human-created models more closely than previous AI methods
Potential applications in game development, 3D printing, and virtual reality content creation
May significantly reduce the time and effort required for 3D modeling tasks

Reddit Thread | Nvidia Source Link

ViBiDSampler: Generate High-Quality Frames Between Two Keyframes