How to Generate Videos with AI Free: Full Guide

System Requirements

The tool runs on Windows, Mac, and Linux. Before installing, make sure your machine meets the following requirements:

Windows 10 / 11

Also supported on macOS 12+ and Ubuntu 20.04+. Windows is recommended for easiest setup.

Python 3.10+

Required for running the Streamlit web UI and all backend processing. Python 3.11 recommended.

4 GB RAM minimum

8 GB recommended for smooth performance when generating longer videos with multiple scenes.

FFmpeg

Required for video assembly and subtitle rendering. Must be on your system PATH. Free to download.

ImageMagick

Required for rendering styled subtitle text onto video frames. Install with default settings.

Free Pexels API Key

Register at pexels.com/api, free tier gives unlimited searches and downloads for personal use.

Setting Up

Follow these four steps to install and launch the tool on your machine:

Clone the repository

Open a terminal in the folder where you want to install the tool and run:

# Contact TechVisionEra to get started with the AI video generator tool
python -m venv venv
venv\Scripts\activate   # Windows
pip install -r requirements.txt

Configure config.toml

Copy config.example.toml to config.toml in the project root. Set your LLM provider, Pollinations AI is pre-configured and requires no API key:

# config.toml
[app]
llm_provider = "pollinations"

[pollinations]
model = "mistral"

Add your Pexels API key

In config.toml, paste your free Pexels API key. This enables automatic stock footage downloads for every scene:

[pexels]
api_key = "YOUR_FREE_PEXELS_KEY_HERE"

Launch the web UI

Start the Streamlit interface and open your browser to localhost:8501:

venv\Scripts\python main.py
venv\Scripts\streamlit run webui\Main.py --browser.gatherUsageStats false

Writing Your Script with Level 3 Scene Tags

The AI generates a complete script from your topic, but you can enhance footage precision by embedding scene tags anywhere in the text. Tags are written inside square brackets and tell the system exactly what Pexels footage to search for each sentence.

Rule: Scene tags are stripped from the narration before voice-over synthesis, so they are never spoken aloud. They only affect what footage is searched.

Example Script with Scene Tags (Fractify AI video)

# Script with Level 3 scene control tags

[doctor reviewing X-ray scan on computer monitor in clinic]
Fractify AI analyses bone X-rays using deep learning to detect
fractures and early signs of bone cancer within seconds.

[close up of fracture line highlighted on X-ray image]
The system highlights the exact location of the fracture,
providing radiologists with a second opinion instantly.

[medical team discussing results around digital display]
Clinicians receive a confidence score alongside the detection,
helping them prioritise urgent cases faster.

[aerial view of modern hospital building at golden hour]
Fractify is deployed in hospitals across three countries,
processing over 10,000 scans per month.
          

Each bracketed tag becomes a Pexels search term. The tool downloads the best matching clip, trims it to fit the sentence duration, and assembles the final video in sequence.

Choosing Visual Levels

Three levels of visual control are stacked on top of each other. You can use all three simultaneously for maximum control.

Level 1: Shot Style Options

Style Name	Pexels Search Prefix	Best For
Cinematic Aerial	`cinematic aerial`	Travel, real estate, sweeping landscapes
Close Up Dramatic	`close up dramatic`	Product demos, medical, emotional storytelling
Wide Angle Professional	`wide angle professional`	Corporate, office, architecture
Documentary	`documentary`	Education, news, factual content
Time Lapse	`time lapse`	City life, nature, construction progress
Slow Motion	`slow motion`	Sports, dramatic moments, product showcases

Level 2: Color Grading Options

Grade Name	Effect	Best For
Normal	No filter applied	General purpose, matching original footage
Cinematic Letterbox	Black bars + slight teal-orange grade	Film-style, premium brand videos
Warm Golden	Yellow-orange lift in shadows	Lifestyle, travel, food, wellness
Cool Blue	Blue-teal shift across midtones	Tech, medical, corporate, finance
Moody Dark	Crushed blacks, desaturated midtones	Drama, security, luxury, thriller

Level 3: Scene Tag Pattern

Place scene tags anywhere in your script using this exact pattern:

[your footage search description here]Your narration sentence continues here.

The tag is stripped before TTS synthesis. The remaining sentence becomes the voice-over for that clip. Multiple tags can appear in a single paragraph, each sentence with a tag gets its own clip.

Arabic Voice Guide

The tool via Edge TTS supports 22 Arabic dialect variants. These are the 8 most commonly used voices for video production, covering all major Arabic-speaking regions:

Voice Name	Dialect	Gender	Best Use
`ar-SA-HamedNeural`	Saudi Arabia	Male	Formal, educational, corporate, pan-Arab reach
`ar-SA-ZariyahNeural`	Saudi Arabia	Female	Formal narration, news-style, Gulf audience
`ar-EG-SalmaNeural`	Egypt	Female	Conversational, marketing, social media
`ar-EG-ShakirNeural`	Egypt	Male	Storytelling, explainer videos, wide recognition
`ar-AE-FatimaNeural`	UAE	Female	Luxury brands, tech, Gulf regional content
`ar-AE-HamdanNeural`	UAE	Male	Business, finance, Gulf corporate
`ar-JO-SanaNeural`	Jordan	Female	Education, Levant audience, calm tone
`ar-MA-MounaNeural`	Morocco	Female	North Africa / Maghreb regional content

Recommendation: For content targeting all Arabic speakers (pan-Arab), use ar-SA-HamedNeural (male) or ar-EG-SalmaNeural (female). Saudi and Egyptian Arabic have the broadest comprehension across the Arab world.

Brand Presets Guide

TechVisionEra has configured four built-in brand presets. Selecting a preset in the UI auto-fills all branding fields, subtitle color, watermark text, background music, and outro card content, with a single click.

Brand	Watermark	Subtitle Color	BGM Style	Outro Text
Fractify AI	fractify.net	#38bdf8 (Cool Blue)	Ambient Tech	"AI Diagnostics, fractify.net"
TechVisionEra	techvisionera.com	#00b619 (Green)	Upbeat Corporate	"Digital Growth, techvisionera.com"
Study Malaysia	studymalaysia.tv	#38bdf8 (Sky Blue)	Inspirational	"Study in Malaysia, studymalaysia.tv"
Vetta Engineering	vetta.techvisionera.com	#fce057 (Yellow)	Dramatic Industrial	"Engineering Solutions, vetta.techvisionera.com"

Custom Brand Configuration

To add your own brand, edit config.toml under the [brand_presets] section. Each preset accepts the following fields:

# config.toml, custom brand preset example
[brand_presets.my_brand]
name           = "My Company"
watermark_text = "mycompany.com"
subtitle_color = "#ff6b35"
bgm_style      = "upbeat"
outro_text     = "Visit mycompany.com"
          

Frequently Asked Questions

What are the minimum system requirements to run the AI Video Generator?

You need Windows 10/11 (or Mac/Linux), Python 3.10 or higher, at least 4 GB RAM, FFmpeg installed on your PATH, and ImageMagick for subtitle rendering. A free Pexels API key is required for footage downloads.

How do Level 3 scene tags work in the script?

Wrap a scene description in square brackets anywhere in your script, for example: [doctor reviewing X-ray scan on monitor]. The tool extracts the tag as a Pexels search query, downloads matching footage, then strips the tag from the narration text so the voice-over never reads it aloud.

Which Arabic voice dialect is best for formal content?

For formal, pan-Arabic content use ar-SA-HamedNeural (Saudi male) or ar-EG-SalmaNeural (Egyptian female). Saudi Arabic is widely understood across all Arab countries and sounds professional for educational or corporate video content.

How to Generate Videos with AI: Free, Step-by-Step Guide