Data Extraction and Processing Analyst

# of Openings


IDC CEMA - Tracker Webscraping and Data Harvesting Team in Ostrava is GROWING!


As member of the team that develops web scraper, bot, crawler and machine language solutions you will be key part of our ongoing market research initiatives. You will use ParseHub or Mozenda and other tools in DWH environment on daily basis.


Your solutions will gather web pricing and product spec information across hundreds of technology products from thousands of web sites across the world.  You will also design algorithms to clean and classify the data into the IDC taxonomy for analyst review and analysis.


  • Figure out optimal solutions for various data sources
  • Keep learning latest technology for web scraping / crawling technology
  • Work with market analyst teams to understand their needs and the analyzed markets
  • Creation and maintenance of thousands of agents, scheduling, aggregating data sets, cleaning text and HTML data, and integrating output into other databases
  • Design database and implement algorithms for data cleansing
  • Integration with IDC research application and cooperation with development teams


  • Good communication skills
  • Analytical mind
  • Happy to learn and explore new technology / tools
  • Passion for web and data analyses & processing
  • Ability to reasonably communicate in English
  • Relevant B.S. or Master's degree
  • Minimum 3 year's of experience with databases/big data, SQL programming
  • Knowledge of ETL tools

Nice to have skills:

    • Experience with web scraping and/or web crawling tool
    • Python
    • Machine learning knowledge
    • Experience with data lifecycle management
    • OLAP
    • Data Mining

 Please note that this position is not suitable for fresh graduates.


IDC offers interesting work with a young international team, the opportunity to participate in a developing and innovative field, professional growth, and a competitive remuneration package.


Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
Share on your newsfeed