Starting Over - What would I learn?
- July 21, 2022
I’ve thought this question over many over times since the recording, and here is what I should have said.
If I were to start from scratch now, I would start by learning Python.
I didn’t learn Python until I joined Neo4j, and now I feel like I missed out on so much. Python can be used for so many things.
I like to think of Python as a gateway language, and that’s not trying to be disparaging. It’s really easy to learn, there are no brackets and no semi-colons to contend with so you can concentrate on achieving your goal.
I know friends whose children that are not even teenagers yet learning to write learning Python at school. Imagine what these kids will be able to achieve when they get to working age!
The ecosystem around Python is excellent, and there are some great open source libraries that you can use to do a really diverse range of things. Pandas are great data analysis and DataFrames allow you to perform some pretty complex actions. Once you have manipulated that data, you can quickly visualise in a multitude of ways with matplotlib with a couple of lines of code.
You can do this all inside a Jupyter notebook and share the results with anyone or write a tutorial that anyone can recreate.
I’ve spent a lot of time recently producing reporting on GraphAcademy, plotting enrolment numbers over time, identifying trends, and so on. Previously, I would have tried to over-engineer a solution here by creating an SPA with Vue or React and plot the data with a library like Highcharts, Chart.js or Nivo Charts but that is extremely labour intensive.
It’s much easier to do this in Python. Jupyter notebooks allow you to wrap code with comments and markdown so you can tell a story as you go and make things easy to understand.
Python for AI/ML
When we talk about building APIs with Node.js, the rational is that you use the same language in the backend as the UI. The same rationality applies for Python and Machine Learning.
I’ve only scratched the surface on building machine learning models with scikit-learn and TensorFlow (my favourite is a Random Forest for predicting customer churn based on graph features) but these seem really straight forward to pick up. You can use your trained models directly in a Flask API (or FastAPI, etc) and run the algorithms in real-time on the fly.
In a Microservices Architecture
If you still want to write your APIs in Node.js or TypeScript, that you can use a microservices architecture or even GraphQL resolvers to combine the machine learning elements with a more traditional API.
You don’t need to go all-in and define types and interfaces for absolutely everything. You can instead just sprinkle in the typescript features where they are needed. Where and when you use these types will come with experience.
I would go as far as to say that it has saved hours of debugging time and cut out a lot of silly mistakes.
Has React has won the battle?
Over the years, I’ve jumped between front-end frameworks. One of my first Angular 1.something projects written around is still in production at a company. I tried earlier versions of React with class components,
componentDidMount() etc and hated it. Global state management was a mess.
For a while I had a strong preference for Vue but the migration from version 2 to 3 has been a bit of a mess. I first experimented with the Composition API in 2020 when the concept was first introduced but it still doesn’t seem to be widely adopted.
From my experience consulting for Neo4j, and within Neo4j itself, it seems that React has won the battle.
Since React hooks and functional components were introduced, it has also become a lot nicer to use. I mean, sure, Redux and JSX can still be a bit of a hot mess at times but there are steps and architecture decisions you can take to mitigate this.
After 6 months, what project doesn’t feel like a mess anyway?!
Data? That’s less clear to me
Maybe I’m a little too close to the subject, but I don’t see a one-size-fits-all solution for data. Although I work with Graph Databases on a daily basis I’m still of the opinion that you use the best tool for the job. Instead of one definitive answer, let me throw out a few thoughts:
Relational databases are still cool - I still love a relational database and they’re not going anywhere any time soon. I would recommend taking a look at PostgresQL if you want an open-source relational database.
NoSQL is a broad term that covers a lot of database types: Document store; Key-value stores; Graphs; Wide-column Stores; Multi-model DBs. They all have their own strengths and weaknesses. It’s worth at least understanding the strengths and weaknesses of each. Beware, there can be a lot of hype and hyperbole.
Data Mobility is Important, especially in more complex architectures. Understanding how streams of data can be consumed between applications is a great skill to have.
Kafka would be my #1 choice for this.
Graphs are Everywhere - I’m paid to have this opinion it’s true so feel free to take it with a pinch of salt, and connections are becoming increasingly more important to AI & ML.
I would have no reservations about using Neo4j as the primary database in any new project I start. In my experience they are as performant as a relational database and ACID transactions mean the data is as safe. Ironically, relational databases are bad at handling relationships, so as the data complexity grows and more table joins are needed, you have paid upfront for the benefits of querying a graph.
Add to the fact that they’re pretty fun to work with and you’re on to a winner.
This is by no means an exhausted list of resources, but here are some of the resources I would recommend if you are getting started with anything I have mentioned above.
- www.learnpython.org is a great place
- TypeScript in 50 Lessons by Stefan Baumgartner is the perfect place to learn everything you need to know about TypeScript.