Beginner

Project Setup

Set up the project structure and install yfinance, PyTorch, FinBERT, and Streamlit. By the end, you will have a working environment that can fetch stock data.

Architecture

Stock Data (yfinance) ---> Technical Indicators ---+
                                                     |---> LSTM Model ---> Predictions ---> Dashboard
News Data (NewsAPI)  ---> FinBERT Sentiment    ---+

Step 1: Project Structure

stock-predictor/
+-- app/
|   +-- data_collector.py     # yfinance + NewsAPI
|   +-- indicators.py         # Technical indicators
|   +-- sentiment.py          # FinBERT scoring
|   +-- model.py              # LSTM model
|   +-- backtester.py         # Backtesting engine
|   +-- dashboard.py          # Streamlit app
+-- data/                     # Cached data
+-- models/                   # Saved models
+-- requirements.txt
+-- .env

Step 2: Dependencies

# requirements.txt
yfinance==0.2.36
pandas==2.2.0
numpy==1.26.3
torch==2.5.1
transformers==4.47.1
scikit-learn==1.4.0
streamlit==1.40.0
plotly==5.18.0
newsapi-python==0.2.7
python-dotenv==1.0.1
ta==0.11.0

Step 3: Configuration

# app/config.py
import os
from dotenv import load_dotenv
load_dotenv()

NEWSAPI_KEY = os.getenv("NEWSAPI_KEY", "")
DEFAULT_TICKER = "AAPL"
LOOKBACK_DAYS = 365 * 2
SEQUENCE_LENGTH = 60
TRAIN_SPLIT = 0.8
BATCH_SIZE = 32
EPOCHS = 50
LEARNING_RATE = 0.001

Step 4: Verify Setup

import yfinance as yf
import torch

# Test yfinance
data = yf.download("AAPL", period="5d")
print(f"AAPL data: {len(data)} rows")
print(data.tail())

# Test PyTorch
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
📝
Disclaimer: This project is for educational purposes only. Stock predictions are inherently uncertain. Never invest real money based solely on model outputs. Past performance does not predict future results.

Key Takeaways

  • The system combines three data sources: price history, technical indicators, and news sentiment.
  • yfinance provides free historical stock data with no API key required.
  • PyTorch LSTM handles sequential time series data for price forecasting.
  • FinBERT is a pre-trained model specifically fine-tuned for financial text sentiment.