How to Generate Videos with AI — Free, Step-by-Step Guide

A complete walkthrough of the AI Video Generator, the open-source AI video generator configured by TechVisionEra. Learn how to install the tool, write scene-tagged scripts, choose visual levels, set up Arabic voice-over, and apply brand presets — all for free.

System Requirements

The tool runs on Windows, Mac, and Linux. Before installing, make sure your machine meets the following requirements:

Windows 10 / 11

Also supported on macOS 12+ and Ubuntu 20.04+. Windows is recommended for easiest setup.

Python 3.10+

Required for running the Streamlit web UI and all backend processing. Python 3.11 recommended.

4 GB RAM minimum

8 GB recommended for smooth performance when generating longer videos with multiple scenes.

FFmpeg

Required for video assembly and subtitle rendering. Must be on your system PATH. Free to download.

ImageMagick

Required for rendering styled subtitle text onto video frames. Install with default settings.

Free Pexels API Key

Register at pexels.com/api — free tier gives unlimited searches and downloads for personal use.

Setting Up

Follow these four steps to install and launch the tool on your machine:

Clone the repository

Open a terminal in the folder where you want to install the tool and run:

# Contact TechVisionEra to get started with the AI video generator tool python -m venv venv venv\Scripts\activate # Windows pip install -r requirements.txt

Configure config.toml

Copy config.example.toml to config.toml in the project root. Set your LLM provider — Pollinations AI is pre-configured and requires no API key:

# config.toml [app] llm_provider = "pollinations" [pollinations] model = "mistral"

Add your Pexels API key

In config.toml, paste your free Pexels API key. This enables automatic stock footage downloads for every scene:

[pexels] api_key = "YOUR_FREE_PEXELS_KEY_HERE"

Launch the web UI

Start the Streamlit interface and open your browser to localhost:8501:

venv\Scripts\python main.py venv\Scripts\streamlit run webui\Main.py --browser.gatherUsageStats false

Writing Your Script with Level 3 Scene Tags

The AI generates a complete script from your topic, but you can enhance footage precision by embedding scene tags anywhere in the text. Tags are written inside square brackets and tell the system exactly what Pexels footage to search for each sentence.

Rule: Scene tags are stripped from the narration before voice-over synthesis, so they are never spoken aloud. They only affect what footage is searched.

Example Script with Scene Tags (Fractify AI video)

# Script with Level 3 scene control tags [doctor reviewing X-ray scan on computer monitor in clinic] Fractify AI analyses bone X-rays using deep learning to detect fractures and early signs of bone cancer within seconds. [close up of fracture line highlighted on X-ray image] The system highlights the exact location of the fracture, providing radiologists with a second opinion instantly. [medical team discussing results around digital display] Clinicians receive a confidence score alongside the detection, helping them prioritise urgent cases faster. [aerial view of modern hospital building at golden hour] Fractify is deployed in hospitals across three countries, processing over 10,000 scans per month.

Each bracketed tag becomes a Pexels search term. The tool downloads the best matching clip, trims it to fit the sentence duration, and assembles the final video in sequence.

Choosing Visual Levels

Three levels of visual control are stacked on top of each other. You can use all three simultaneously for maximum control.

Level 1 — Shot Style Options

Style Name Pexels Search Prefix Best For
Cinematic Aerial cinematic aerial Travel, real estate, sweeping landscapes
Close Up Dramatic close up dramatic Product demos, medical, emotional storytelling
Wide Angle Professional wide angle professional Corporate, office, architecture
Documentary documentary Education, news, factual content
Time Lapse time lapse City life, nature, construction progress
Slow Motion slow motion Sports, dramatic moments, product showcases

Level 2 — Color Grading Options

Grade Name Effect Best For
Normal No filter applied General purpose, matching original footage
Cinematic Letterbox Black bars + slight teal-orange grade Film-style, premium brand videos
Warm Golden Yellow-orange lift in shadows Lifestyle, travel, food, wellness
Cool Blue Blue-teal shift across midtones Tech, medical, corporate, finance
Moody Dark Crushed blacks, desaturated midtones Drama, security, luxury, thriller

Level 3 — Scene Tag Pattern

Place scene tags anywhere in your script using this exact pattern:

[your footage search description here]Your narration sentence continues here.

The tag is stripped before TTS synthesis. The remaining sentence becomes the voice-over for that clip. Multiple tags can appear in a single paragraph — each sentence with a tag gets its own clip.

Arabic Voice Guide

The tool via Edge TTS supports 22 Arabic dialect variants. These are the 8 most commonly used voices for video production, covering all major Arabic-speaking regions:

Voice Name Dialect Gender Best Use
ar-SA-HamedNeural Saudi Arabia Male Formal, educational, corporate — pan-Arab reach
ar-SA-ZariyahNeural Saudi Arabia Female Formal narration, news-style, Gulf audience
ar-EG-SalmaNeural Egypt Female Conversational, marketing, social media
ar-EG-ShakirNeural Egypt Male Storytelling, explainer videos, wide recognition
ar-AE-FatimaNeural UAE Female Luxury brands, tech, Gulf regional content
ar-AE-HamdanNeural UAE Male Business, finance, Gulf corporate
ar-JO-SanaNeural Jordan Female Education, Levant audience, calm tone
ar-MA-MounaNeural Morocco Female North Africa / Maghreb regional content

Recommendation: For content targeting all Arabic speakers (pan-Arab), use ar-SA-HamedNeural (male) or ar-EG-SalmaNeural (female). Saudi and Egyptian Arabic have the broadest comprehension across the Arab world.

Brand Presets Guide

TechVisionEra has configured four built-in brand presets. Selecting a preset in the UI auto-fills all branding fields — subtitle color, watermark text, background music, and outro card content — with a single click.

Brand Watermark Subtitle Color BGM Style Outro Text
Fractify AI fractify.net #38bdf8 (Cool Blue) Ambient Tech "AI Diagnostics — fractify.net"
TechVisionEra techvisionera.com #00b619 (Green) Upbeat Corporate "Digital Growth — techvisionera.com"
Study Malaysia studymalaysia.tv #38bdf8 (Sky Blue) Inspirational "Study in Malaysia — studymalaysia.tv"
Vetta Engineering vetta.techvisionera.com #fce057 (Yellow) Dramatic Industrial "Engineering Solutions — vetta.techvisionera.com"

Custom Brand Configuration

To add your own brand, edit config.toml under the [brand_presets] section. Each preset accepts the following fields:

# config.toml — custom brand preset example [brand_presets.my_brand] name = "My Company" watermark_text = "mycompany.com" subtitle_color = "#ff6b35" bgm_style = "upbeat" outro_text = "Visit mycompany.com"

Frequently Asked Questions

What are the minimum system requirements to run the AI Video Generator?
You need Windows 10/11 (or Mac/Linux), Python 3.10 or higher, at least 4 GB RAM, FFmpeg installed on your PATH, and ImageMagick for subtitle rendering. A free Pexels API key is required for footage downloads.
How do Level 3 scene tags work in the script?
Wrap a scene description in square brackets anywhere in your script, for example: [doctor reviewing X-ray scan on monitor]. The tool extracts the tag as a Pexels search query, downloads matching footage, then strips the tag from the narration text so the voice-over never reads it aloud.
Which Arabic voice dialect is best for formal content?
For formal, pan-Arabic content use ar-SA-HamedNeural (Saudi male) or ar-EG-SalmaNeural (Egyptian female). Saudi Arabic is widely understood across all Arab countries and sounds professional for educational or corporate video content.

Ready to Create Your First AI Video?

Go back to the main page or contact TechVisionEra for a custom setup.