Delete duplicate JSON file based on one of the attributes

Sibgha :

I have two directories in my linux system, /dir and /dir2

Both have more than 4000 JSON files. The JSON content of every file is like

{
   "someattribute":"someValue",
   "url":[
      "https://www.someUrl.com/xyz"
   ],
   "someattribute":"someValue"
}

Note that url is an array, but it always contains one element (the url).

The url makes the file unique. If there is a file with the same url in /dir and /dir2 then it's a duplicate and it needs to be deleted.

I want to automate this operation either using a shell command preferrably. Any opinion how I should go about it?

oguz ismail :

Use to get a list of duplicates:

jq -nr 'foreach inputs.url[0] as $u (
  {}; .[$u] += 1; if .[$u] > 1
  then input_filename
  else empty end
)' dir/*.json dir2/*.json

And to delete them, pipe above command's output to xargs:

xargs -d $'\n' rm --

or, for compatibility with non-GNU xargs that has -0 but not -d:

tr '\n' '\0' | xargs -0 rm --

Note that filenames must not contain line feeds.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=87296&siteId=1