Comprehensive technical documentation for the Homester AI-powered real estate platform backend.
The Homester backend is a production-ready Flask application that combines traditional REST API patterns with modern AI capabilities to provide intelligent real estate search and management functionality.
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Frontend │ │ Load Balancer │ │ CDN/Assets │
│ (React/Vue) │────│ (nginx/ALB) │────│ (CloudFront) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │
│ ┌─────────────────┐
│ │ Flask API │
└──────────────│ (Gunicorn) │
└─────────────────┘
│
┌────────────┼────────────┐
│ │ │
┌─────────────────┐ ┌──────────┐ ┌─────────────────┐
│ PostgreSQL │ │ Redis │ │ OpenAI API │
│ (Primary DB) │ │ (Cache) │ │ (AI Services) │
└─────────────────┘ └──────────┘ └─────────────────┘
yaml
Framework: Flask 2.3+
API Documentation: Flask-RESTX (OpenAPI/Swagger)
Web Server: Gunicorn with sync workers
ASGI Server: Compatible with Uvicorn for async operations
yaml
Primary Database: PostgreSQL 14+
ORM: SQLAlchemy 2.0+
Connection Pooling: SQLAlchemy pool (size: 10, overflow: 20)
Migrations: Flask-Migrate (Alembic)
Cache Layer: Redis 6+ (optional)
yaml
Language Model: OpenAI GPT-4o
Embeddings: text-embedding-3-small
Vector Storage: PostgreSQL with JSON columns
Similarity Search: Cosine similarity calculation
yaml
Authentication: JWT with Flask-JWT-Extended
Password Hashing: Werkzeug PBKDF2
Rate Limiting: Flask-Limiter
CORS: Flask-CORS
Input Validation: Custom validators
yaml
Containerization: Docker
Orchestration: Kubernetes
Process Management: Gunicorn
Monitoring: Built-in logging + external tools
CI/CD: Docker-based deployment
The database follows a normalized relational model with JSON fields for flexible data storage:
```sql -- Core user management CREATE TABLE users ( id SERIAL PRIMARY KEY, email VARCHAR(120) UNIQUE NOT NULL, password_hash VARCHAR(256) NOT NULL, first_name VARCHAR(50), last_name VARCHAR(50), phone VARCHAR(20), is_active BOOLEAN DEFAULT TRUE, is_verified BOOLEAN DEFAULT FALSE, created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() );
-- Property listings with comprehensive data CREATE TABLE properties ( id SERIAL PRIMARY KEY, mls_id VARCHAR(50) UNIQUE, title VARCHAR(200) NOT NULL, description TEXT, address VARCHAR(500) NOT NULL, city VARCHAR(100) NOT NULL, state VARCHAR(50) NOT NULL, zip_code VARCHAR(10) NOT NULL, country VARCHAR(50) DEFAULT 'USA',
-- Property details
price DECIMAL(12,2) NOT NULL,
bedrooms INTEGER,
bathrooms DECIMAL(3,1),
square_feet INTEGER,
lot_size DECIMAL(8,3),
year_built INTEGER,
property_type VARCHAR(50),
listing_type VARCHAR(20) DEFAULT 'for_sale',
-- Location data
latitude DECIMAL(10,8),
longitude DECIMAL(11,8),
-- Flexible JSON data
features JSONB, -- amenities, parking, etc.
schools JSONB, -- nearby schools with ratings
neighborhood_info JSONB, -- walkability, transit, etc.
image_urls JSONB, -- array of image URLs
-- Listing information
virtual_tour_url VARCHAR(500),
listing_agent_name VARCHAR(100),
listing_agent_phone VARCHAR(20),
listing_agent_email VARCHAR(120),
listing_date TIMESTAMP WITH TIME ZONE,
status VARCHAR(20) DEFAULT 'active',
-- AI search optimization
embedding_vector TEXT, -- JSON string of vector embeddings
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
-- Chat system for AI interactions CREATE TABLE chat_sessions ( id SERIAL PRIMARY KEY, session_id VARCHAR(100) UNIQUE NOT NULL, user_id INTEGER REFERENCES users(id), title VARCHAR(200), is_active BOOLEAN DEFAULT TRUE, created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() );
CREATE TABLE chat_messages ( id SERIAL PRIMARY KEY, session_id INTEGER REFERENCES chat_sessions(id) ON DELETE CASCADE, role VARCHAR(20) NOT NULL, -- 'user', 'assistant', 'system' content TEXT NOT NULL, message_metadata JSONB, -- additional context, property IDs, etc. created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() );
-- User favorites for property bookmarking CREATE TABLE user_favorites ( id SERIAL PRIMARY KEY, user_id INTEGER REFERENCES users(id) ON DELETE CASCADE, property_id INTEGER REFERENCES properties(id) ON DELETE CASCADE, created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), UNIQUE(user_id, property_id) ); ```
```sql -- Performance optimization indexes CREATE INDEX idx_properties_price ON properties(price); CREATE INDEX idx_properties_bedrooms ON properties(bedrooms); CREATE INDEX idx_properties_bathrooms ON properties(bathrooms); CREATE INDEX idx_properties_location ON properties(city, state); CREATE INDEX idx_properties_type ON properties(property_type); CREATE INDEX idx_properties_status ON properties(status); CREATE INDEX idx_properties_mls ON properties(mls_id);
-- User and session indexes CREATE INDEX idx_users_email ON users(email); CREATE INDEX idx_chat_sessions_user ON chat_sessions(user_id); CREATE INDEX idx_chat_messages_session ON chat_messages(session_id); CREATE INDEX idx_user_favorites_user ON user_favorites(user_id); ```
Users (1) ──────────── (∞) Chat Sessions
│ │
│ └── (∞) Chat Messages
│
└── (∞) User Favorites ── (1) Properties
The API follows a namespace-based organization pattern using Flask-RESTX for automatic OpenAPI documentation generation:
```python
def create_app(): app = Flask(name)
# Initialize extensions
db.init_app(app)
jwt.init_app(app)
CORS(app)
limiter.init_app(app)
# Initialize API with documentation
api = Api(app, doc='/docs/', version='1.0')
# Register namespaces
api.add_namespace(auth_ns, path='/api/auth')
api.add_namespace(properties_ns, path='/api/properties')
api.add_namespace(chat_ns, path='/api/chat')
return app
```
/api/auth)```python @auth_ns.route('/register') class RegisterResource(Resource): @auth_ns.expect(register_model) @auth_ns.marshal_with(user_response_model) def post(self): """Register a new user""" # Implementation with validation and JWT token generation
@auth_ns.route('/login') class LoginResource(Resource): @auth_ns.expect(login_model) @auth_ns.marshal_with(token_response_model) def post(self): """Authenticate user and return tokens""" # Implementation with password verification and token issuance ```
/api/properties)```python @properties_ns.route('/search') class PropertySearchResource(Resource): @properties_ns.marshal_with(search_response_model) def get(self): """Search properties with filters""" # Advanced filtering with PostgreSQL queries # Integration with vector similarity search
@properties_ns.route('/
/api/chat)python
@chat_ns.route('/chat')
class ChatResource(Resource):
@chat_ns.expect(chat_message_model)
@chat_ns.marshal_with(chat_response_model)
@limiter.limit("30 per minute")
def post(self):
"""Send a message and get AI response"""
# OpenAI integration with property search
# Vector similarity matching
```python
user_model = api.model('User', { 'id': fields.Integer(required=True), 'email': fields.String(required=True), 'first_name': fields.String(), 'last_name': fields.String(), 'created_at': fields.DateTime() })
property_model = api.model('Property', { 'id': fields.Integer(required=True), 'title': fields.String(required=True), 'price': fields.Float(required=True), 'bedrooms': fields.Integer(), 'bathrooms': fields.Float(), 'features': fields.Raw(), # JSON field 'image_urls': fields.List(fields.String()) }) ```
The AI integration follows a service-oriented architecture with clear separation of concerns:
```python class AIService: """Orchestrates AI-powered property search and chat responses"""
def __init__(self):
self.openai_client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
self.model = "gpt-4o" # Latest OpenAI model
@classmethod
def get_chat_response(cls, conversation: List[Dict]) -> Dict:
"""Generate AI response with property recommendations"""
# 1. Analyze user message with GPT-4o
# 2. Extract search parameters
# 3. Perform vector similarity search
# 4. Fallback to traditional filtering
# 5. Generate contextualized response
```
```python class EmbeddingService: """Manages OpenAI embeddings for semantic search"""
def __init__(self):
self.embedding_model = "text-embedding-3-small"
@classmethod
def generate_embedding(cls, text: str) -> List[float]:
"""Generate embedding vector for text"""
# OpenAI API call for embedding generation
@classmethod
def search_properties_by_text(cls, query: str, limit: int = 10) -> List[Dict]:
"""Semantic property search using vector similarity"""
# 1. Generate query embedding
# 2. Calculate cosine similarity with stored embeddings
# 3. Return ranked results
```
Properties are converted to searchable text that captures all relevant information:
```python def create_property_embedding_text(property_obj: Property) -> str: """Create comprehensive text representation for embedding""" parts = []
# Basic information
parts.append(f"{property_obj.title}")
parts.append(f"{property_obj.description}")
# Location details
parts.append(f"Located in {property_obj.city}, {property_obj.state}")
# Property characteristics
parts.append(f"{property_obj.bedrooms} bedrooms, {property_obj.bathrooms} bathrooms")
parts.append(f"{property_obj.square_feet} square feet")
parts.append(f"{property_obj.property_type}")
# Features and amenities
if property_obj.features:
for category, items in property_obj.features.items():
parts.append(f"{category}: {', '.join(items)}")
return " ".join(parts)
```
```python def _cosine_similarity(vec1: List[float], vec2: List[float]) -> float: """Calculate cosine similarity between two vectors""" dot_product = sum(a * b for a, b in zip(vec1, vec2)) magnitude1 = sum(a * a for a in vec1) ** 0.5 magnitude2 = sum(b * b for b in vec2) ** 0.5
if magnitude1 == 0 or magnitude2 == 0:
return 0
return dot_product / (magnitude1 * magnitude2)
```
```python
JWT_ACCESS_TOKEN_EXPIRES = timedelta(hours=24) JWT_REFRESH_TOKEN_EXPIRES = timedelta(days=30)
class User(db.Model): def get_tokens(self): """Generate JWT tokens for user""" access_token = create_access_token(identity=str(self.id)) refresh_token = create_refresh_token(identity=str(self.id)) return { 'access_token': access_token, 'refresh_token': refresh_token } ```
```python
limiter = Limiter( key_func=get_remote_address, default_limits=["200 per day", "50 per hour"] )
CORS(app, origins=[ "http://localhost:3000", "https://try.homester.in" ])
def validate_email(email: str) -> bool: """Validate email format using regex"""
def validate_password(password: str) -> bool: """Validate password strength requirements""" ```
```
1. User Request
├── GET /api/properties/search?min_price=300000&bedrooms=3
└── Authorization: Bearer
Authentication Middleware ├── Validate JWT token ├── Extract user identity └── Authorize request
Input Validation ├── Validate query parameters ├── Sanitize input values └── Apply default limits
Service Layer ├── PropertyService.search_properties() ├── Build SQLAlchemy query with filters └── Execute with pagination
Database Query ├── SELECT * FROM properties ├── WHERE price >= 300000 AND bedrooms = 3 └── ORDER BY created_at DESC LIMIT 20
Response Formation ├── Serialize property objects ├── Add pagination metadata └── Return JSON response ```
```
1. User Message
├── POST /api/chat/chat
├── {"message": "I need a house in Texas under $500k"}
└── Authorization: Bearer
Message Processing ├── Save user message to database ├── Retrieve conversation history └── Prepare context for AI
AI Analysis ├── Send conversation to OpenAI GPT-4o ├── Extract search intent and parameters └── Generate conversational response
Property Search ├── Generate embedding for user query ├── Calculate similarity with property embeddings ├── Rank results by relevance score └── Fallback to traditional search if needed
Response Generation ├── Combine AI response with property data ├── Save assistant message to database └── Return structured response with properties ```
```dockerfile FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 5000
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--reuse-port", "--reload", "main:app"] ```
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: homester-api
spec:
replicas: 3
selector:
matchLabels:
app: homester-api
template:
metadata:
labels:
app: homester-api
spec:
containers:
- name: api
image: homester-api:latest
ports:
- containerPort: 5000
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: homester-secrets
key: database-url
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: homester-secrets
key: openai-api-key
```yaml Infrastructure: Web Server: Nginx (reverse proxy, SSL termination) Application Server: Gunicorn (multiple workers) Database: PostgreSQL (primary + read replicas) Cache: Redis (session storage, rate limiting) CDN: CloudFront (static assets) Monitoring: Prometheus + Grafana Logging: ELK Stack (Elasticsearch, Logstash, Kibana)
Scaling Strategy: Horizontal: Multiple API server instances Database: Read replicas for queries Cache: Redis cluster for high availability CDN: Global distribution for assets ```
```python
SQLALCHEMY_ENGINE_OPTIONS = { "pool_recycle": 300, # Recycle connections every 5 minutes "pool_pre_ping": True, # Validate connections before use "pool_size": 10, # Base connection pool size "max_overflow": 20 # Maximum additional connections }
def search_properties(filters, page=1, per_page=20): """Optimized property search with indexed columns""" query = Property.query
# Use indexed columns for filtering
if filters.get('min_price'):
query = query.filter(Property.price >= filters['min_price'])
if filters.get('bedrooms'):
query = query.filter(Property.bedrooms == filters['bedrooms'])
# Pagination for large result sets
return query.paginate(page=page, per_page=per_page)
```
```python
@cache.memoize(timeout=300) def get_property_statistics(): """Cache expensive statistical queries"""
RATELIMIT_STORAGE_URL = "redis://localhost:6379/1" ```
```python
async def generate_embeddings_batch(properties: List[Property]): """Async batch processing for embeddings""" tasks = [generate_embedding(prop.description) for prop in properties] return await asyncio.gather(*tasks) ```
Backend Structure:
├── api/ # API endpoints (Flask-RESTX namespaces)
│ ├── __init__.py
│ ├── auth.py # Authentication endpoints
│ ├── properties.py # Property management
│ └── chat.py # AI chat interface
├── services/ # Business logic layer
│ ├── __init__.py
│ ├── ai_service.py # OpenAI integration
│ ├── property_service.py # Property operations
│ └── embedding_service.py # Vector embeddings
├── utils/ # Utility functions
│ ├── __init__.py
│ ├── auth.py # Authentication helpers
│ └── validators.py # Input validation
├── models.py # SQLAlchemy models
├── config.py # Configuration management
├── app.py # Flask application factory
└── main.py # Application entry point
```python
@api.errorhandler(ValidationError) def handle_validation_error(error): return {'message': 'Validation failed', 'errors': error.messages}, 400
@api.errorhandler(NotFound) def handle_not_found(error): return {'message': 'Resource not found'}, 404
@api.errorhandler(Unauthorized) def handle_unauthorized(error): return {'message': 'Authentication required'}, 401 ```
```python
def test_property_search(): """Test property search functionality""" filters = {'min_price': 300000, 'bedrooms': 3} result = PropertyService.search_properties(filters) assert len(result['properties']) > 0
def test_auth_endpoint(client): """Test user registration endpoint""" response = client.post('/api/auth/register', json={ 'email': '[email protected]', 'password': 'securepassword123' }) assert response.status_code == 201 ```
```python
class Config: # Database SQLALCHEMY_DATABASE_URI = os.environ.get('DATABASE_URL')
# Authentication
JWT_SECRET_KEY = os.environ.get('JWT_SECRET_KEY')
# AI Services
OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY')
# Performance
SQLALCHEMY_ENGINE_OPTIONS = {
"pool_recycle": 300,
"pool_pre_ping": True,
"pool_size": int(os.environ.get('DB_POOL_SIZE', 10))
}
```
This architecture provides a robust, scalable foundation for the Homester real estate platform, combining traditional REST API patterns with modern AI capabilities while maintaining production-ready performance and security standards.