latest news

in this website, we will provide you all new latest news

  • Home
  • Business
    • Internet
    • Market
    • Stock
  • Parent Category
    • Child Category 1
      • Sub Child Category 1
      • Sub Child Category 2
      • Sub Child Category 3
    • Child Category 2
    • Child Category 3
    • Child Category 4
  • Featured
  • Health
    • Childcare
    • Doctors
  • Home
  • Business
    • Internet
    • Market
    • Stock
  • Downloads
    • Dvd
    • Games
    • Software
      • Office
  • Parent Category
    • Child Category 1
      • Sub Child Category 1
      • Sub Child Category 2
      • Sub Child Category 3
    • Child Category 2
    • Child Category 3
    • Child Category 4
  • Featured
  • Health
    • Childcare
    • Doctors
  • Uncategorized

Saturday, 13 May 2023

New top story on Hacker News: Committing changes to a 130GB Git repository without full checkouts [video]

 Champ     13:22     Hacker News     No comments   

Committing changes to a 130GB Git repository without full checkouts [video]
6 by eliomattia | 0 comments on Hacker News.
Hey HN, would appreciate feedback on a version control for data toolset I am building, creatively called the Data Manager. When working with large repositories with data, full checkouts are problematic. Many git-for-data solutions will create a new copy of the entire datasets for each commit and none of them allow contributing to a data repo without full checkouts, to my knowledge. In the video, a workflow that does not require full checkouts of the datasets and still allows to commit changes in Git is presented. Specifically, it becomes possible to check out kilobytes to commit changes to a 130 gigabyte repository, including versions. Note that only diffs are committed, at row, column, and cell level, so the diffing that appears in the GUI will seem weird, since it will interpret the old diff as the file to be compared with the new one, when in fact they are both just diffs. The goal of the Data Manager is to version datasets and structured data in general, in a storage-efficient way, and easily identify and deploy to S3 datasets snapshots, identified by repository and commit sha (and optionally a tag) that need to be pulled for processing. S3 is also used to upload heavy files that are then pointed by reference, not URL, in Git commits. The no-full-checkout workflow shown applies naturally to adding data and can be extended to edits or deletions provided the old data is known. That is to ensure the creation of bidirectional diffs that enable navigating Git history both forward and backward, useful when caching snapshots. The burden of checking out and building snapshots from diff history is now borne by localhost, but that may change, as mentioned in the video. Smart navigation of git history from the nearest available snapshots, building snapshots with Spark, and other ways to save on data transfer and compute are being evaluated. This paradigm enables hibernating or cleaning up history on S3 for datasets no longer necessary to create snapshots, like those that are deleted, if snapshots of earlier commits are not needed. Individual data entries could also be removed for GDPR compliance using versioning on S3 objects, orthogonal to git. The prototype already cures the pain point I built it for: it was impossible to (1) uniquely identify and (2) make available behind an API multiple versions of a collection of datasets and config parameters, (3) without overburdening HDDs due to small, but frequent changes to any of the datasets in the repo and (4) while being able to see the diffs in git for each commit in order to enable collaborative discussions and reverting or further editing if necessary. Some background: I am building natural language AI algorithms (a) easily retrainable on editable training datasets, meaning changes or deletions in the training data are reflected fast, without traces of past training and without retraining the entire language model (sounds impossible), and (b) that explain decisions back to individual training data. LLMs have fixed training datasets, whereas editable datasets call for a system to manage data efficiently, plus I wanted to have something that integrates naturally with common, tried and tested tools such as Git, S3, and MySQL, hence the Data Manager. I am considering open-source: is that the best way to go? Which license to choose?

  • Share This:  
  •  Facebook
  •  Twitter
  •  Google+
  •  Stumble
  •  Digg
Email ThisBlogThis!Share to XShare to Facebook
Newer Post Older Post Home

0 comments:

Post a Comment

Popular Posts

  • 简报:中美发表应对气候危机联合声明;医生称纳瓦尔尼病危
    By BY EMILY CHAN AND KONEY BAI from NYT World https://ift.tt/3dva1lP via IFTTT
  • New York Post Reporter Who Wrote False Kamala Harris Story Resigns
    By BY MICHAEL M. GRYNBAUM from NYT Business https://ift.tt/3aKd8Ex via IFTTT
  • New top story on Hacker News: Visa and Mastercard are getting overwhelmed by gamer fury over censorship
    Visa and Mastercard are getting overwhelmed by gamer fury over censorship 181 by mrzool | 134 comments on Hacker News.
  • New top story on Hacker News: The Power of Starting Again
    The Power of Starting Again 10 by memorable | 2 comments on Hacker News.
  • New top story on Hacker News: Organic Maps migrates to Forgejo due to GitHub account blocked by Microsoft
    Organic Maps migrates to Forgejo due to GitHub account blocked by Microsoft 30 by mraniki | 8 comments on Hacker News.
  • New top story on Hacker News: Ask HN: How to be productive with big existing code base
    Ask HN: How to be productive with big existing code base 134 by maheshs | 103 comments on Hacker News. I have just started working with o...
  • New top story on Hacker News: My Experience with Claude Code After 2 Weeks of Adventures
    My Experience with Claude Code After 2 Weeks of Adventures 3 by dejavucoder | 0 comments on Hacker News.
  • New top story on Hacker News: Ask HN: Who wants to be hired? (July 2025)
    Ask HN: Who wants to be hired? (July 2025) 13 by whoishiring | 82 comments on Hacker News. Share your information if you are looking for ...
  • New top story on Hacker News: Nuclear Waste Reprocessing Gains Momentum in the U.S.
    Nuclear Waste Reprocessing Gains Momentum in the U.S. 14 by rbanffy | 4 comments on Hacker News.
  • New top story on Hacker News: Instrumenting Next.js with runtime secret injection
    Instrumenting Next.js with runtime secret injection 6 by nimishk | 3 comments on Hacker News.

Recent Posts

Categories

  • BBC News
  • BBC News - Technology
  • BBC News - World
  • BOLLYWOOD Jagran Hindi News - entertainment:bollywood
  • CBNNews.com
  • CLASS 10 BEST BOOKS FOR BECOME A TOPPER
  • CRICKETJagran Hindi News - cricket:headlines
  • FOX NEWS
  • Hacker News
  • INDIAJagran Hindi News - news:national
  • NYT
  • Reuters: World News

Unordered List

Pages

  • Home

Text Widget

Blog Archive

  • ►  2026 (29)
    • ►  January (29)
  • ►  2025 (738)
    • ►  December (53)
    • ►  November (52)
    • ►  October (60)
    • ►  September (61)
    • ►  August (63)
    • ►  July (71)
    • ►  June (64)
    • ►  May (71)
    • ►  April (61)
    • ►  March (66)
    • ►  February (51)
    • ►  January (65)
  • ►  2024 (756)
    • ►  December (73)
    • ►  November (69)
    • ►  October (64)
    • ►  September (58)
    • ►  August (71)
    • ►  July (63)
    • ►  June (63)
    • ►  May (64)
    • ►  April (64)
    • ►  March (66)
    • ►  February (35)
    • ►  January (66)
  • ▼  2023 (1593)
    • ►  December (64)
    • ►  November (69)
    • ►  October (80)
    • ►  September (112)
    • ►  August (111)
    • ►  July (129)
    • ►  June (135)
    • ▼  May (181)
      • Wrestlers protest: The fake smiles of India's deta...
      • AI: War crimes evidence erased by social media pla...
      • US actor Danny Masterson found guilty on two rape ...
      • Amazon to pay $30m over Alexa and Ring privacy vio...
      • Yugoslav war: UN increases sentence on two Serbian...
      • Amazon staff protest climate record and office return
      • New top story on Hacker News: Apple’s big test of ...
      • Elizabeth Holmes walks into Texas prison to start ...
      • New top story on Hacker News: Ask HN: Where have y...
      • New top story on Hacker News: Show HN: I open sour...
      • New top story on Hacker News: Albert Camus
      • Nato to send 700 more troops to Kosovo after clashes
      • German police arrest stripper over toy gun
      • New top story on Hacker News: Fossil tells the 'ta...
      • New top story on Hacker News: Research on harvesti...
      • New top story on Hacker News: Pro-cash movement wa...
      • Kosovo: Fresh clashes as Nato troops called in to ...
      • New top story on Hacker News: Landauer's Principle
      • New top story on Hacker News: Bread Board Wristwat...
      • Turkish president sings to supporters from top of bus
      • New top story on Hacker News: Ian Hacking has died
      • New top story on Hacker News: Kings Grew Pale
      • Ukraine war: Zelenksy praises air force after Russ...
      • New top story on Hacker News: 19th century painter...
      • New top story on Hacker News: Translating/bridging...
      • New top story on Hacker News: Is This Octopus Havi...
      • Turkish presidency run-off decides if Erdogan shou...
      • New top story on Hacker News: The HTTP QUERY Metho...
      • New top story on Hacker News: Senators issued sate...
      • New top story on Hacker News: Ispace Announces Res...
      • New top story on Hacker News: Observable Plot 0.6....
      • New top story on Hacker News: Can we create a thre...
      • New top story on Hacker News: The Sonnet Machine
      • New top story on Hacker News: Phishing domains tan...
      • New top story on Hacker News: The History of CMOS
      • Driver swept backwards in raging Spanish floodwaters
      • New top story on Hacker News: Show HN: Hacker News...
      • Google removes 'Slavery Simulator' game amid outra...
      • Was Russia attacked by a Ukrainian drone boat in t...
      • New top story on Hacker News: Super Colliding Nix ...
      • Aderrien Murry: Mississippi boy, 11, shot by offic...
      • New top story on Hacker News: Behind a Pseudonym, ...
      • New top story on Hacker News: The Structure of Sci...
      • Belgorod: Russian paramilitary group vows more inc...
      • New top story on Hacker News: Show HN: Mount Unix ...
      • New top story on Hacker News: “Rewrite It in Rust”...
      • New top story on Hacker News: Ask HN: How does arc...
      • Watch: Eight of Tina Turner's most legendary songs
      • New top story on Hacker News: Heat Wave and Blacko...
      • New top story on Hacker News: Correct Horse Batter...
      • Ron DeSantis: Who is the Florida governor and Whit...
      • New top story on Hacker News: The Alien-Life Summi...
      • Ron DeSantis to launch 2024 presidential bid on Tw...
      • Adolf Hitler house in Austria to be used for polic...
      • New top story on Hacker News: Talks That Attendees...
      • New top story on Hacker News: Building a Better Bo...
      • New top story on Hacker News: Disney and the Great...
      • New top story on Hacker News: What It Was Like to ...
      • New top story on Hacker News: Ethical, societal im...
      • New top story on Hacker News: The Two-Century Ques...
      • New top story on Hacker News: An Example of a Sad ...
      • New top story on Hacker News: The Data Fix with Dr...
      • New top story on Hacker News: Domicide: At a Neoli...
      • New top story on Hacker News: Typical: Data interc...
      • US debt ceiling: Joe Biden and Kevin McCarthy seek...
      • Ukraine war: ICC 'undeterred' by arrest warrant fo...
      • New top story on Hacker News: “Don Knuth Plays wit...
      • New top story on Hacker News: Show HN: My affordab...
      • British novelist Martin Amis dies aged 73
      • New top story on Hacker News: Gfycat has been down...
      • New top story on Hacker News: Review: Energy and C...
      • New top story on Hacker News: Debugging a FUSE dea...
      • New top story on Hacker News: Switching from QWERT...
      • New top story on Hacker News: US Air Force Shoots ...
      • New top story on Hacker News: User Stories? Thanks...
      • How West Ham's night of glory turned ugly... in 85...
      • New top story on Hacker News: Anti-Piracy Program ...
      • New top story on Hacker News: The Future of Progra...
      • New top story on Hacker News: Thirty Minutes or Le...
      • New top story on Hacker News: 20 Years of Gentoo
      • New top story on Hacker News: NYC skyscrapers sit ...
      • New top story on Hacker News: Zig now has built-in...
      • Disney scraps $867m Florida plan amid Ron DeSantis...
      • New top story on Hacker News: Spain, Sweden, and B...
      • New top story on Hacker News: Degrowth and the mon...
      • New top story on Hacker News: Show HN: Smallville ...
      • Serbians hand in guns and question culture of viol...
      • New top story on Hacker News: Understanding Databa...
      • New top story on Hacker News: Astmaker – A DSL in ...
      • New top story on Hacker News: Google Analytics 4 H...
      • New top story on Hacker News: Verifying Dynamic Tr...
      • Biden G7: President to cut short foreign trip for ...
      • New top story on Hacker News: Colossal Biosciences...
      • Asante King asks British Museum to return gold to ...
      • New top story on Hacker News: How Does a TLS Hands...
      • New top story on Hacker News: You studied computer...
      • New Zealand hostel fire: At least six dead and mor...
      • New top story on Hacker News: Random(Random(Random...
      • New top story on Hacker News: Server Sent Events
      • New top story on Hacker News: Software Development...
    • ►  April (173)
    • ►  March (189)
    • ►  February (166)
    • ►  January (184)
  • ►  2022 (2295)
    • ►  December (177)
    • ►  November (178)
    • ►  October (202)
    • ►  September (194)
    • ►  August (194)
    • ►  July (198)
    • ►  June (184)
    • ►  May (186)
    • ►  April (195)
    • ►  March (184)
    • ►  February (183)
    • ►  January (220)
  • ►  2021 (7845)
    • ►  December (335)
    • ►  November (635)
    • ►  October (656)
    • ►  September (636)
    • ►  August (713)
    • ►  July (713)
    • ►  June (690)
    • ►  May (707)
    • ►  April (690)
    • ►  March (713)
    • ►  February (644)
    • ►  January (713)
  • ►  2020 (8315)
    • ►  December (713)
    • ►  November (688)
    • ►  October (614)
    • ►  September (690)
    • ►  August (713)
    • ►  July (713)
    • ►  June (690)
    • ►  May (713)
    • ►  April (690)
    • ►  March (711)
    • ►  February (667)
    • ►  January (713)
  • ►  2019 (19506)
    • ►  December (712)
    • ►  November (689)
    • ►  October (712)
    • ►  September (681)
    • ►  August (712)
    • ►  July (713)
    • ►  June (689)
    • ►  May (2935)
    • ►  April (2907)
    • ►  March (3014)
    • ►  February (2731)
    • ►  January (3011)
  • ►  2018 (21108)
    • ►  December (3036)
    • ►  November (2927)
    • ►  October (3024)
    • ►  September (2931)
    • ►  August (3016)
    • ►  July (3033)
    • ►  June (2790)
    • ►  May (350)
    • ►  March (1)

About Me

Champ
View my complete profile
Powered by Blogger.

Sample Text

Copyright © latest news | Powered by Blogger
Design by Hardeep Asrani | Blogger Theme by NewBloggerThemes.com