Show HN: Vicinity – Fast, Lightweight Nearest Neighbors with Flexible Back Ends
11 by Pringled | 0 comments on Hacker News. We’ve just open-sourced Vicinity, a lightweight approximate nearest neighbors (ANN) search package that allows for fast experimentation and comparison of a larger number of well known algorithms. Main features: - Lightweight: the base package only uses Numpy - Unified interface: use any of the supported algorithms and backends with a single interface: HNSW, Annoy, FAISS, and many more algorithms and libraries are supported - Easy evaluation: evaluate the performance of your backend with a simple function to measure queries per second vs recall - Serialization: save and load your index for persistence After working with a large number of ANN libraries over the years, we found it increasingly cumbersome to learn the interface, features, quirks, and limitations of every library. After writing custom evaluation code to measure the speed and performance for the 100th time to compare libraries, we decided to build this as a way to easily use a large number of algorithms and libraries with a unified, simple interface that allows for quick comparison and evaluation. We are curious to hear your feedback! Are there any algorithms that are missing that you use? Any extra evaluation metrics that are useful?
Show HN: Flash Kitty – Archive of Adobe/Macromedia Flash Movies from Flash Kit
9 by gzalo | 1 comments on Hacker News. After realizing a few months ago that the current flashkit owners didn't really back up any of the user submitted movies, and getting some Flash nostalgia, I created this working Flash Kit archive using data from the Wayback Machine/Internet Archive. It uses Raffle so you can watch the submitted movies in a modern browser without needing plugins. It's not curated so you'll find a variety of things, some things are really creative and can be used for inspiration.
Launch HN: Midship (YC S24) – Turn PDFs and Images into usable data
12 by maxmaio | 13 comments on Hacker News. Hey HN, we are Max, Kieran, and Aahel from Midship ( https://midship.ai ). Midship makes it easy to extract data from unstructured documents like pdfs and images. Here’s a video showing it in action: https://ift.tt/Zx91mob?... , and a demo playground (no signup required!) to test it out: https://ift.tt/LOPe2mD We started 5 months ago initially trying to make an AI natural language workflow builder that would be a simpler alternative to Zapier or Make.com. However, most of our users seemed to be much more interested in the basic (and not very good) document extraction feature we had. Seeing how people were spending hours a day manually extracting data from pdfs inspired us to build what has become Midship! The problem is that despite all our progress in software, huge amounts of business data still lives in PDFs and images. Sure, you can OCR them, but getting clean, structured data out is still painful. Most existing tools just give you a blob of markdown - leaving you to figure out which parts matter and how they relate. We've found that combining OCR with language models lets us do something more useful: extract specific fields and tables that users actually care about. The LLMs help correct OCR mistakes and understand context (like knowing that "Inv#" and "Invoice Number" mean the same thing). We have two main kinds of users today, non-technical users that extract data via our web app and developers who use our extraction api. We were initially focused on the first one as they seemed like an underserved part of the market, but we’ve received a lot of interest from developers who face the same issues. For pricing, we currently charge a monthly Saas fee per seat for the web app and a volume based pricing for the API. We’re really excited to share what we’ve built so far and look forward to any feedback from the community!
Show HN: Cerebellum – Open-Source Browser Control with Claude 3.5 Computer Use
3 by theredsix | 1 comments on Hacker News. Hi HN! I was mesmerized by the Claude Computer Use reveal last week and was specifically impressed by how well it navigated websites. This motivated me to create Cerebellum, a library that lets an LLM take control of a browser. Here is a demo of Cerebellum in action, performing the goal “Find a USB C to C cable that is 10 feet long and add it to cart” on amazon.com: https://youtu.be/xaZbuaWtVkA?si=Tq9lE6BXv9wjZ-qC Currently, it uses Claude 3.5 Sonnet’s newly released computer use ability, but the ultimate goal is to crowdsource a high quality set of browser sessions to train an open source local model. Checkout the MIT licensed repo on github ( https://ift.tt/Sa2x4vD ) or install the library from npm ( https://ift.tt/DFqm9xP ) Looking for feedback from the HN community, especially on: What browser tasks would you use an LLM to complete? Thanks again for taking a look!
Show HN: Donobu – Mac App for Web Automation and Testing
21 by wewtyflakes | 1 comments on Hacker News. Been working on a desktop app for Mac that lets you create web flows and rerun them ( https://www.donobu.com/ ). You can optionally use AI (BYOK: bring your own keys) to create flows for you and to do other interesting things, like making vision-based semantic assertions. Also, your data lives on your own filesystem, and we do not see any of it (further still, there is no phoning home at all). A nice benefit of this being a desktop app rather than a SAAS product, is that if you happen to be developing/iterating on a webpage locally, this has no problem hooking into it. What this intends to be a good fit for: - Testing web pages, especially locally. - Exploring random webpages with a stated objective. - Automating tedious flows. Rerunning a flow won't get caught up on using a single selector (many websites randomize element IDs, for instance), there is smart failover using a prioritized list of selectors. - Getting a quick draft of an end-to-end test in Javascript. What this is a bad fit for: - Mass web scraping (too slow). - Adversarial websites. What we are still working out: - Click-and-drag operations. - Websites that are primarily controlled from canvas. - Smoothing out UI/UX (we are two backend engineers trying our best, and are handedly outgunned by real frontend engineers). Fun things to try: - Asking it to assert that a webpage has a certain theme. - Asking it to run an accessibility report for a page (uses https://ift.tt/KlAVxT7 ). - Asking it to run a cookie report for a page. The tech: - Java 21 for the main business logic. - Javalin 6 for the web framework ( https://javalin.io/ ). - Playwright for controlling the browser ( https://ift.tt/qzIril0 ). - Axe for running accessibility reports ( https://ift.tt/KlAVxT7 ). Critical feedback is welcome. Thanks for trying it out! Cheers, -Justin and Vaz
Show HN: Kameo – a Rust library for building fault-tolerant, async actors
26 by tqwewe | 7 comments on Hacker News. Hi HN, I’m excited to share Kameo, a lightweight Rust library that helps you build fault-tolerant, distributed, and asynchronous actors. If you're working on distributed systems, microservices, or real-time applications, Kameo offers a simple yet powerful API for handling concurrency, panic recovery, and remote messaging between nodes. Key Features: - Async Rust: Each actor runs as a separate Tokio task, making concurrency management simple. - Remote Messaging: Seamlessly send messages to actors across different nodes. - Supervision and Fault Tolerance: Create self-healing systems with actor hierarchies. - Backpressure Support: Supports bounded and unbounded mpsc messaging. I built Kameo because I wanted a more intuitive, scalable solution for distributed Rust applications. I’d love feedback from the HN community and contributions from anyone interested in Rust and actor-based systems. Check out the project on GitHub: https://ift.tt/c7zN8Hd Looking forward to hearing your thoughts!
Show HN: EloqKV – Scalable distributed ACID key-value database with Redis API
10 by hubertzhang | 17 comments on Hacker News. We're thrilled to unveil EloqKV, a lightning-fast distributed key-value store with a Redis-compatible API. Built on a new database architecture called the Data Substrate, EloqKV brings significant innovations to database design. Here’s the unique features that makes it stand out: - Flexible Deployment: Run it as a single-node in-memory KV cache, a larger-than-memory database or scale to a highly available, distributed transactional database with ease. - High Performance: Achieves performance levels comparable to top in-memory databases like Redis and DragonflyDB, while significantly outperforming durable KV stores like KVRocks. - Full ACID Transactions: Ensures complete transactional integrity, even in distributed environments. - Independent Resource Scaling: Scale CPU, memory, storage, and logging resources independently to meet your needs. We’d love to hear your thoughts and feedback!
Ask HN: Who wants to be hired? (September 2024)
16 by whoishiring | 85 comments on Hacker News. Share your information if you are looking for work. Please use this format: Location: Remote: Willing to relocate: Technologies: Résumé/CV: Email: Please only post if you are personally looking for work. Agencies, recruiters, job boards, and so on, are off topic here. Readers: please only email these addresses to discuss work opportunities. There's a site for searching these posts at https://ift.tt/cHTkxVl .
Show HN: Claude Artifacts" but creating real web apps
21 by antonoo | 9 comments on Hacker News. Hey Hacker News! Launching gptengineer.app into beta today. It's like Claude Artifacts, but: - you can edit the code in your fav IDE (two-way github sync) - installs npm packages - automatically picks up build and runtime errors and fixes them - very fast, built with rust The full stack capabilities are built on supabase (prefer to not have to handle auth + user data at this point so this is owned by the user) The seed for this project was an open source experiment, posted about that previously here: https://ift.tt/xo49QMI Would love feedback if you give it a try!
Why don't we have personalized search engines?
18 by enether | 20 comments on Hacker News. - Search as it is today sucks - Google is an ad-engine, not a search engine - SEO is gamed all the time The end result is a search result that isn't that valuable. Why isn't there a tool that allows me to: - search good content I've read - search curated (from other people I trust) content - search books and other paid material I have bought - search my notes (that are scattered throughout 5 apps) All in one?
Jakob Ingebrigtsen smashes the 3,000m world record before Armand Duplantis breaks his own pole vault world record at the Diamond League event in Silesia.
Show HN: Tree-sitter Integration for Swift
9 by daspoon | 1 comments on Hacker News. I have created a Swift package ( https://ift.tt/zIxydCY ) enabling tree-sitter parsers to be written in Swift; specifically, as an array of production rules which map symbol types to pairings of syntax expression and type constructor. A member macro derives a tree-sitter grammar and embeds the generated parser in its expansion. This project is a work in progress, and I will be grateful for any feedback. Thanks, Dave
Show HN: Denormalized – Embeddable Stream Processing in Rust and DataFusion
24 by ambrood | 4 comments on Hacker News. tl;dr we built an embeddable stream processing engine in Rust using apache DataFusion, check us out at https://ift.tt/0TyS8tj Hey HN, We’d like to showcase a very early version of our embeddable stream processing engine called Denormalized. The rise of DuckDB has abundantly made it clear that even for many workloads of Terabyte scale, a single node system outshines the distributed query engines of previous generation such as Spark, Snowflake etc in terms of both performance and cost. Now a lot of workloads DuckDB is used for were normally considered to be “big data” in the previous generation, but no more. In the context of streaming especially, this problem is more acute. A streaming system is designed to incrementally process large amounts of data over a period of time. Even on the upper end of scale, productionized use-cases of stream processing are rarely performing compute on more than tens of gigabytes of data at a given time. Even so, the standard stream processing solutions such as Flink involve spinning up a distributed JVM cluster to even compute against the simplest of event streams. To that end, we’re building Denormalized designed to be embeddable in your applications and scale up to hundreds of thousands of events per second with a Flink-like dataflow API. While we currently only support Rust, we have plans for Python and Typescript bindings soon. We’re built atop DataFusion and the Arrow ecosystems and currently support streaming joins as well as windowed aggregations on Kafka topics. Please check out out repo at: https://ift.tt/0TyS8tj We’d love to hear your feedback.
Ask HN: How different is AWS/GCP/Azure in everyday work
23 by michal_kluczek | 17 comments on Hacker News. I've almost exclusively been working with GCP for years, with very few occasions when I've created some resources in AWS (I'm managing infra using terraform). When looking a job now, it's very common that I'm rejected before TI because I wasn't working with AWS. Is it really so fundamentally different from GCP or any other cloud provider for that matter? I have a wild feeling that 80-90% of the products all cloud providers offer are same toys but with different names and integrations mechanisms. There are surely some quirks that are exclusive for a specific cloud provider, but is it really that many to stifle your performance?
Ask HN: Best Tools for Monorepo?
8 by bradhe | 4 comments on Hacker News. I've got a monorepo I'm working in that has a Golang backend with a couple services and a Next.js front-end. Everything lives in a monorepo together. My tooling is super weak, though! For instance, for process management in development I'm using Goreman, which is a Foreman alternative in Goalng. Wondering what's the state of the art for managing the processes in local dev in monorepos in 2024? Or other tools for managing a monorepo I might be missing in general!
Show HN: I built an open-source tool to make on-call suck less
19 by aray07 | 3 comments on Hacker News. Hey HN, I am building an open source platform to make on-call better and less stressful for engineers. We are building a tool that can silence alerts and help with debugging and root cause analysis. We also want to automate tedious parts of being on-call (running runbooks manually, answering questions on Slack, dealing with Pagerduty). Here is a quick video of how it works: https://youtu.be/m_K9Dq1kZDw I hated being on-call for a couple of reasons: * Alert volume: The number of alerts kept increasing over time. It was hard to maintain existing alerts. This would lead to a lot of noisy and unactionable alerts. I have lost count of the number of times I got woken up by alert that auto-resolved 5 minutes later. * Debugging: Debugging an alert or a customer support ticket would need me to gain context on a service that I might not have worked on before. These companies used many observability tools that would make debugging challenging. There are always a time pressure to resolve issues quickly. There were some more tangential issues that used to take up a lot of on-call time * Support: Answering questions from other teams. A lot of times these questions were repetitive and have been answered before. * Dealing with PagerDuty: These tools are hard to use. e.g. It was hard to schedule an override in PD or do holiday schedules. I am building an on-call tool that is Slack-native since that has become the de-facto tool for on-call engineers. We heard from a lot of engineers that maintaining good alert hygiene is a challenge. To start off, Opslane integrates with Datadog and can classify alerts as actionable or noisy. We analyze your alert history across various signals: 1. Alert frequency 2. How quickly the alerts have resolved in the past 3. Alert priority 4. Alert response history Our classification is conservative and it can be tuned as teams get more confidence in the predictions. We want to make sure that you aren't accidentally missing a critical alert. Additionally, we generate a weekly report based on all your alerts to give you a picture of your overall alert hygiene. What’s next? 1. Building more integrations (Prometheus, Splunk, Sentry, PagerDuty) to continue making on-call quality of life better 2. Help make debugging and root cause analysis easier. 3. Runbook automation We’re still pretty early in development and we want to make on-call quality of life better. Any feedback would be much appreciated!