15 open source tools that every data scientist should learn
A versatile and widely used programming language with a vast ecosystem of data science libraries.
Python
A powerful language for statistical computing and graphics.
R:
A standard language for querying and manipulating data in relational databases.
SQL:
A web application for creating and sharing documents that contain live code, equations, visualizations, and narrative text.
Jupyter Notebook:
A Python library for data manipulation and analysis.
Pandas:
NumPy: A Python library for working with arrays.
NumPy:
Matplotlib: A Python library for creating visualizations.
Matplotlib:
A Python library for machine learning.
Scikit-learn:
A Python library for deep learning.
TensorFlow:
A Python library for deep learning.
PyTorch:
A unified analytics engine for large-scale data processing.
Spark:
A distributed computing framework for processing large datasets.
Hadoop:
A platform for programmatically scheduling and monitoring workflows.
Airflow:
A tool for transforming data in warehouses.
DBT (Data Build Tool):
A workflow orchestration tool for managing data pipelines.
Prefect: