System Requirements
The tool runs on Windows, Mac, and Linux. Before installing, make sure your machine meets the following requirements:
Windows 10 / 11
Also supported on macOS 12+ and Ubuntu 20.04+. Windows is recommended for easiest setup.
Python 3.10+
Required for running the Streamlit web UI and all backend processing. Python 3.11 recommended.
4 GB RAM minimum
8 GB recommended for smooth performance when generating longer videos with multiple scenes.
FFmpeg
Required for video assembly and subtitle rendering. Must be on your system PATH. Free to download.
ImageMagick
Required for rendering styled subtitle text onto video frames. Install with default settings.
Free Pexels API Key
Register at pexels.com/api — free tier gives unlimited searches and downloads for personal use.
Setting Up
Follow these four steps to install and launch the tool on your machine:
Clone the repository
Open a terminal in the folder where you want to install the tool and run:
# Contact TechVisionEra to get started with the AI video generator tool
python -m venv venv
venv\Scripts\activate # Windows
pip install -r requirements.txt
Configure config.toml
Copy config.example.toml to config.toml in the project root. Set your LLM provider — Pollinations AI is pre-configured and requires no API key:
# config.toml
[app]
llm_provider = "pollinations"
[pollinations]
model = "mistral"
Add your Pexels API key
In config.toml, paste your free Pexels API key. This enables automatic stock footage downloads for every scene:
[pexels]
api_key = "YOUR_FREE_PEXELS_KEY_HERE"
Launch the web UI
Start the Streamlit interface and open your browser to localhost:8501:
venv\Scripts\python main.py
venv\Scripts\streamlit run webui\Main.py --browser.gatherUsageStats false
Writing Your Script with Level 3 Scene Tags
The AI generates a complete script from your topic, but you can enhance footage precision by embedding scene tags anywhere in the text. Tags are written inside square brackets and tell the system exactly what Pexels footage to search for each sentence.
Rule: Scene tags are stripped from the narration before voice-over synthesis, so they are never spoken aloud. They only affect what footage is searched.
Example Script with Scene Tags (Fractify AI video)
Each bracketed tag becomes a Pexels search term. The tool downloads the best matching clip, trims it to fit the sentence duration, and assembles the final video in sequence.
Choosing Visual Levels
Three levels of visual control are stacked on top of each other. You can use all three simultaneously for maximum control.
Level 1 — Shot Style Options
| Style Name | Pexels Search Prefix | Best For |
|---|---|---|
| Cinematic Aerial | cinematic aerial |
Travel, real estate, sweeping landscapes |
| Close Up Dramatic | close up dramatic |
Product demos, medical, emotional storytelling |
| Wide Angle Professional | wide angle professional |
Corporate, office, architecture |
| Documentary | documentary |
Education, news, factual content |
| Time Lapse | time lapse |
City life, nature, construction progress |
| Slow Motion | slow motion |
Sports, dramatic moments, product showcases |
Level 2 — Color Grading Options
| Grade Name | Effect | Best For |
|---|---|---|
| Normal | No filter applied | General purpose, matching original footage |
| Cinematic Letterbox | Black bars + slight teal-orange grade | Film-style, premium brand videos |
| Warm Golden | Yellow-orange lift in shadows | Lifestyle, travel, food, wellness |
| Cool Blue | Blue-teal shift across midtones | Tech, medical, corporate, finance |
| Moody Dark | Crushed blacks, desaturated midtones | Drama, security, luxury, thriller |
Level 3 — Scene Tag Pattern
Place scene tags anywhere in your script using this exact pattern:
The tag is stripped before TTS synthesis. The remaining sentence becomes the voice-over for that clip. Multiple tags can appear in a single paragraph — each sentence with a tag gets its own clip.
Arabic Voice Guide
The tool via Edge TTS supports 22 Arabic dialect variants. These are the 8 most commonly used voices for video production, covering all major Arabic-speaking regions:
| Voice Name | Dialect | Gender | Best Use |
|---|---|---|---|
ar-SA-HamedNeural |
Saudi Arabia | Male | Formal, educational, corporate — pan-Arab reach |
ar-SA-ZariyahNeural |
Saudi Arabia | Female | Formal narration, news-style, Gulf audience |
ar-EG-SalmaNeural |
Egypt | Female | Conversational, marketing, social media |
ar-EG-ShakirNeural |
Egypt | Male | Storytelling, explainer videos, wide recognition |
ar-AE-FatimaNeural |
UAE | Female | Luxury brands, tech, Gulf regional content |
ar-AE-HamdanNeural |
UAE | Male | Business, finance, Gulf corporate |
ar-JO-SanaNeural |
Jordan | Female | Education, Levant audience, calm tone |
ar-MA-MounaNeural |
Morocco | Female | North Africa / Maghreb regional content |
Recommendation: For content targeting all Arabic speakers (pan-Arab), use ar-SA-HamedNeural (male) or ar-EG-SalmaNeural (female). Saudi and Egyptian Arabic have the broadest comprehension across the Arab world.
Brand Presets Guide
TechVisionEra has configured four built-in brand presets. Selecting a preset in the UI auto-fills all branding fields — subtitle color, watermark text, background music, and outro card content — with a single click.
| Brand | Watermark | Subtitle Color | BGM Style | Outro Text |
|---|---|---|---|---|
| Fractify AI | fractify.net | #38bdf8 (Cool Blue) | Ambient Tech | "AI Diagnostics — fractify.net" |
| TechVisionEra | techvisionera.com | #00b619 (Green) | Upbeat Corporate | "Digital Growth — techvisionera.com" |
| Study Malaysia | studymalaysia.tv | #38bdf8 (Sky Blue) | Inspirational | "Study in Malaysia — studymalaysia.tv" |
| Vetta Engineering | vetta.techvisionera.com | #fce057 (Yellow) | Dramatic Industrial | "Engineering Solutions — vetta.techvisionera.com" |
Custom Brand Configuration
To add your own brand, edit config.toml under the [brand_presets] section. Each preset accepts the following fields: