| .venv | ||
| .~lock.cyberparks_companies_20250929_120502.csv# | ||
| chromedriver | ||
| cyberparks_companies_20250929_120502.csv | ||
| page_source.html | ||
| python | ||
| README.md | ||
| scraper.py | ||
CyberParks Company Information Scraper
A Python web scraping tool to extract company information from the CyberParks website and save it to a CSV file.
📋 Features
- ✅ Scrapes company names, websites, and leadership information
- ✅ Automatically creates timestamped CSV files
- ✅ Avoids duplicate entries
- ✅ Real-time progress display
- ✅ Visits company websites to find leadership info
- ✅ Error handling and graceful failures
- ✅ Respects rate limits with delays between requests
🎯 Extracted Information
The scraper extracts the following information for each company:
- Company Name - Official company name
- Website - Company website URL
🛠️ Requirements
Python Version
- Python 3.6 or higher
Dependencies
pip install requests beautifulsoup4
Or install from requirements.txt:
pip install -r requirements.txt
requirements.txt
requests>=2.31.0
beautifulsoup4>=4.12.0
📦 Installation
-
Clone or download the script
# Create a project directory mkdir cyberparks-scraper cd cyberparks-scraper -
Save the script
- Save the Python code as
scraper.py
- Save the Python code as
-
Install dependencies
pip install requests beautifulsoup4
🚀 Usage
Basic Usage
Simply run the script:
python scraper.py
What Happens
- The script connects to https://cyberparks.in/companies-at-park/
- Extracts company names and websites from the main page
- Visits each company's website to find leadership information
- Displays real-time progress for each company
- Creates a CSV file with all extracted data
Output
The script creates a CSV file named: cyberparks_companies_YYYYMMDD_HHMMSS.csv
Example: cyberparks_companies_20250929_143022.csv
Sample Output Format
Company Name,Website,MD/CEO/Chairman
Codilar Technologies Pvt.Ltd,https://www.codilar.com,Mahaveer Devabalan
ABANA Technology Private Limited,http://www.abanatechnology.com,John Smith
Analystor Technologies,http://www.analystortech.com,
📊 Console Output
The script provides detailed console output:
======================================================================
CyberParks Company Information Scraper
Extracting: Company Name, Website.
======================================================================
Fetching data from https://cyberparks.in/companies-at-park/...
✓ Page loaded successfully
Found 150 potential entries. Processing...
1. Codilar Technologies Pvt.Ltd
Website: https://www.codilar.com
→ Checking website for leadership info...
✓ Leadership: Mahaveer Devabalan
2. ABANA Technology Private Limited
Website: http://www.abanatechnology.com
→ Checking website for leadership info...
✗ Leadership: Not found
...
======================================================================
✓ Successfully scraped 72 companies!
✓ Data saved to: cyberparks_companies_20250929_143022.csv
Statistics:
- Total companies: 72
- With Leadership info: 45 (62%)
- With Website: 72 (100%)
======================================================================
Scraping completed!
⚙️ Configuration
Timeout Settings
You can adjust the timeout for HTTP requests:
response = requests.get(url, headers=headers, timeout=15) # Change timeout value
Rate Limiting
The script includes a 1-second delay between company website visits:
time.sleep(1) # Adjust delay as needed
Maximum Companies
To limit the number of companies scraped:
if len(companies) >= 72: # Change this number
print("Reached limit. Stopping...")
break
🔧 Troubleshooting
Issue: No companies found
Solution:
- Check your internet connection
- Verify the website URL is accessible
- The website structure may have changed - inspect
debug_page.htmlif created
Issue: No leadership information extracted
Solution:
- Leadership info might not be publicly available on company websites
- Check if the company websites are accessible
- Some companies may not list executive information online
Issue: Connection timeout
Solution:
# Increase timeout value
response = requests.get(url, headers=headers, timeout=30)
Issue: Too many requests / Rate limiting
Solution:
# Increase delay between requests
time.sleep(2) # or higher
📝 Legal & Ethical Considerations
Important Notes
- Terms of Service: Always check the website's Terms of Service before scraping
- robots.txt: Respect the website's robots.txt file
- Rate Limiting: The script includes delays to avoid overwhelming the server
- Data Usage: Use scraped data responsibly and in accordance with privacy laws
- Personal Data: Be mindful of personal information (executive names) and comply with GDPR/data protection laws
Best Practices
- ✅ Run the scraper during off-peak hours
- ✅ Use reasonable delays between requests
- ✅ Don't run the scraper too frequently
- ✅ Cache results to avoid repeated scraping
- ✅ Respect website bandwidth and server resources
🐛 Known Limitations
- Dynamic Content: May not work with JavaScript-heavy websites (would need Selenium)
- Authentication: Cannot access pages requiring login
- CAPTCHA: Cannot bypass CAPTCHA protection
- Structure Changes: Will need updates if website structure changes
- Leadership Info: Not all companies publicly list executive information
🔄 Updates & Maintenance
If the website structure changes:
- Run the script to generate
debug_page.html - Inspect the HTML structure
- Update the CSS selectors and extraction patterns
- Test with a small subset of companies first
📧 Support
If you encounter issues:
- Check the console output for error messages
- Verify all dependencies are installed
- Ensure Python version is 3.6+
- Check internet connectivity
- Verify the target website is accessible
📄 License
This script is provided as-is for educational purposes. Use responsibly and ethically.
🙏 Acknowledgments
- Built with Python, Requests, and BeautifulSoup4
- Designed for CyberParks company directory
Version: 1.0.0
Last Updated: September 29, 2025
Python Version: 3.6+